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Abstract 

On  today’s  modem  battlefield,  the  ability  to  adapt  is  critical.  The  asymmetric 
threats  that  we  now  face  require  that  we  have  the  ability  to  evolve  and  field  new  tech¬ 
nology  quickly  and  reliably.  The  Kalman  filter  is  an  important  algorithm  often  used 
in  target  tracking  applications  to  estimate  the  future  behavior  of  a  system  based  on  a 
series  of  past  behaviors.  There  exists  an  urgent  need  to  provide  a  flexible  Kalman  fil¬ 
ter  implementation  in  a  portable  yet  synthesizable  design.  A  one  dimensional  Kalman 
Eilter  algorithm  provided  in  Mat  lab®  is  used  as  the  basis  for  the  Very  High  Speed 
Integrated  Circuit  Hardware  Description  Language  (VHDL)  model.  The  JAVA  pro¬ 
gramming  language  is  used  to  create  the  VHDL  code  that  describes  the  Kalman  filter 
in  hardware  which  allows  for  maximum  flexibility.  The  internal  parameters  of  the  fil¬ 
ter  such  as  process  noise  covariance,  measurement  noise  covariance,  data  width,  and 
data  shape  can  be  adjusted  to  achieve  an  optimal  design  to  fit  any  requirement. 

A  one- dimensional  behavioral  model  of  the  Kalman  Eilter  is  described,  as  well 
as  a  one-dimensional  and  synthesizable  register  transfer  level  (RTL)  model  with  op¬ 
timizations  for  speed,  area,  and  power.  These  optimizations  are  achieved  by  a  focus 
on  parallelization  as  well  as  careful  Kalman  filter  sub-module  algorithm  selection. 
Newton- Raphson  reciprocal  is  the  chosen  algorithm  for  a  fundamental  aspect  of  the 
Kalman  filter,  which  allows  efficient  high-speed  computation  of  reciprocals  within  the 
overall  system.  The  Newton-Raphson  method  is  also  expanded  for  use  in  calculat¬ 
ing  square-roots  in  an  optimized  and  synthesizable  two-dimensional  VHDL  imple¬ 
mentation  of  the  Kalman  filter.  The  two-dimensional  Kalman  filter  expands  on  the 
one-dimensional  implementation  allowing  for  the  tracking  of  targets  on  a  real-world 
Gartesian  coordinate  system. 

An  additional  goal  of  this  research  is  to  perform  an  investigation  and  charac¬ 
terization  of  how  to  realize  optimal  real-time  target  tracking  algorithms  in  hardware, 
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such  as  FPGAs  or  ASICs,  while  satisfying  real-time  throughput  and  bandwidth  re¬ 
quirements. 
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Characterization  and  Implementation  of  a 
Real-World  Target  Tracking  Algorithm 
ON  Field  Programmable  Gate  Arrays 
WITH  Kalman  Filter  Test  Case 

I.  Introduction 

1 . 1  Chapter  Overview 

In  section  1.2  a  brief  motivation  for  this  thesis  is  provided.  The  problem  statement 
and  scope  of  the  research  is  presented  in  Sections  1.3  and  1.4  respectively. 

1.2  Research  Motivation 

In  todays  military,  the  need  to  adapt  quickly  has  become  paramount.  Quick 
adaptation  means  fielding  new  technologies  quickly,  efficiently,  and  reliably.  In  the 
past,  designing,  testing,  and  fabricating  new  integrated  circuit  technology  was  a  long 
and  expensive  process.  The  goal  of  this  research  is  to  speed  up  this  process  with 
respect  to  target  tracking  technologies.  This  thesis  characterizes  a  set  of  real-time 
implementation  requirements  of  a  Kalman  filter  for  implementation  on  an  FPGA  and 
discusses  various  means  to  optimize  implementations  via  scalable  architectures.  It  is 
through  the  use  of  FPGAs  that  the  design  process  can  be  minimized  both  in  time  as 
well  as  expense.  By  specifying  a  system  in  a  Hardware  Description  Language  (HDL) 
and  using  that  description  to  quickly  compare  alternatives  in  design,  as  well  as  testing 
for  correctness,  tremendous  time  and  expense  can  be  saved  over  traditional  hardware 
prototyping.  A  design  process  that  used  to  take  years  can  now  be  reduced  to  as  little 
as  a  few  months. 
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1.3  Problem  Statement 


The  Kalman  filter,  an  optimal  linear  estimator,  has  become  widespread  in  its 
applications.  One  notable  use  of  the  filter  is  in  target  tracking;  where  noisy,  inaccu¬ 
rate  sensor  data  makes  the  filter  invaluable.  As  described  above,  the  need  to  adapt 
technology  quickly  has  led  the  Air  Force  towards  the  use  of  more  economical,  and 
flexible  technologies  such  as  the  FPGA.  This  thesis  brings  these  two  technologies 
together,  the  Kalman  filter,  and  FPGA  by  implementation  of  the  Kalman  filter  in 
VHDL  for  use  on  FPGAs.  The  development  of  a  clear  set  of  guidelines  allows  for 
the  derivation  of  a  solution  given  the  problem  characteristics.  In  the  case  of  a  real¬ 
time  target  tracking  algorithm  one  might  look  at  parallelizability  and  pipelinability 
as  characteristics  of  the  problem.  These  characteristics  then  play  an  important  role 
in  determining  a  solution.  In  general  the  characteristics  of  real-time  target  tracking 
is  that  of  an  intense  processing  requirement.  Whether  the  tracking  is  done  by  radar, 
infrared  cameras,  passive  detection,  or  visible-spectrum  cameras  the  amount  of  data 
coming  into  the  system  for  processing  can  be  enormous.  For  this  reason  it  is  essential 
for  the  designer  of  a  target  tracking  system  to  be  able  to  optimize  the  design  for  vari¬ 
ous  optimizations.  In  other  words,  the  designer  needs  to  look  at  optimization  for  such 
system  characteristics  as  speed,  area,  and  power  consumption.  Often  times  secondary 
considerations  might  be  reusability  of  code  or  design  flexibility  (i.e.  the  ability  of  a 
design  to  adapt  and  transform  to  fulfill  different  requirements). 

1.4  Research  Scope 

The  scope  of  this  research  is  the  development  and  testing  of  a  Kalman  filter 
in  VHDL  that  satisfies  the  requirements.  These  requirements  are  that  the  VHDL 
model  produce  outputs  that  are  reasonably  close  to  outputs  produced  by  the  Mat  lab® 
Kalman  filter  model.  Also,  a  JAVA  program  will  be  described  that  allows  the  VHDL 
Kalman  filter  model  to  be  more  flexible  in  the  sense  that  a  user  is  able  to  specify 
various  parameters  as  well  as  designate  the  desired  bit-width.  Any  generalizations  in 
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reference  to  model  optimality  are  only  in  reference  to  the  optimality  of  the  Kalman 
hlter  algorithm  itself. 

1 . 5  Thesis  Format 

This  thesis  is  presented  in  five  chapters.  The  motivation  for  the  development 
of  the  VHDL  Kalman  filter  model  is  presented  in  Chapter  1,  along  with  the  problem 
statement  and  assumptions.  Chapter  2  provides  a  brief  history  of  Kalman  filters  as 
well  as  a  brief  discussion  of  real-time  systems.  Details  of  the  problem  solving  approach 
and  considerations  made  during  the  design  process  are  presented  in  Chapter  3,  along 
with  implementation  details  and  detailed  explanations  of  the  various  hardware  stages 
required  in  the  design.  Results  of  the  research  including  testing  and  design  evaluations 
are  presented  in  Chapter  4,  and  conclusions  and  recommendations  for  future  work 
are  found  in  Chapter  5.  Additional  information  including  portions  of  the  VHDL  and 
JAVA  code  are  listed  in  the  appendices  in  order  to  facilitate  application  and  future 
research. 
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II.  Real-Time  Systems  and  the  Kalman  Filter 

2. 1  Chapter  Overview 

This  chapter  discusses  real-time  systems,  and  the  Kalman  filter.  Real-time  sys¬ 
tems  are  discussed  to  demonstrate  some  of  the  requirements  of  implementing  a 
Kalman  filter  as  a  real-time  system.  The  Kalman  filter  is  then  discussed  in  detail  in¬ 
cluding  a  historical  perspective,  current  applications,  and  the  equations  that  describe 
the  iterative  nature  of  the  filter. 

2.2  Real-Time  Systems 

The  Kalman  filter  is  an  optimal  linear  estimator  which,  optimally  estimate  the 
behavior  of  a  system  that  relies  on  noisy  inaccurate  data.  One  example  of  this  is 
in  the  use  of  radar  to  track  aircraft.  The  data  provided  by  the  radar  can  be  noisy 
and  often  full  of  inaccuracies.  It  is  the  job  of  a  linear  estimator  such  as  the  Kalman 
filter  to  optimally  estimate  the  state  of  the  system.  For  target  tracking  purposes,  the 
estimations  provided  by  the  Kalman  filter  need  to  complete  in  real  time.  In  the  case 
of  air  traffic  control,  anything  less  than  real  time  could  prove  disastrous. 

In  order  to  define  what  a  real  time  system  is,  it  is  important  to  first  understand 
what  is  meant  by  ’system’.  In  [18]  a  system  is  defined  as  “a  regularly  interacting  or 
interdependent  group  of  items  forming  a  unified  whole” .  The  formation  of  a  unified 
whole  implies  a  boundary  between  the  system  in  question  and  the  environment  or 
everything  else  around  it  [20].  Real-time  implies  the  need  to  satisfy  timing  constraints. 
Combine  the  two  definitions  and  a  real-time  system  is  a  system  that  has  specific  input, 
output,  and  timing  constraints. 

2.2.1  Problem  or  Implementation  Domination.  When  developing  a  real¬ 
time  system  a  designer  needs  to  consider  whether  the  system  is  problem  dominated 
or  implementation  dominated.  This  essentially  asks  the  question,  where  is  the  most 
difficulty  in  designing  a  system  coming  from?  Does  technology  exist  to  implement 
a  system?  If  so,  then  the  system  is  problem  dominated.  If  technology  doesn’t  exist 
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yet  or  is  cost  prohibitive  then  development  of  a  system  would  be  considered  imple¬ 
mentation  dominated.  When  a  technology  is  just  barely  capable  of  solving  a  type  of 
problem  “the  engineering  of  solutions  in  this  phase  is  inevitably  implementation  dom¬ 
inated”  [20].  In  the  case  of  the  Kalman  Filter  specified  in  this  thesis,  the  technology 
to  build  such  a  system  is  relatively  new  and  arguably  just  capable  of  handling  such 
a  problem;  therefore  the  system  is  implementation  dominated.  In  other  words,  the 
form  of  implementation  becomes  critically  important,  which  is  the  primary  reason  for 
this  research  to  optimize  the  speed  performance  over  area. 

As  mentioned  above,  one  hallmark  of  realtime  systems  is  the  idea  of  timing 
constraints.  These  timing  constraints  can  also  be  thought  of  as  deadlines  that  must 
be  met  or  the  system  fails.  It  can  also  be  stated  as,  a  design  must  satisfy  certain 
timing  requirements.  When  a  design  does  not  satisfy  timing  requirements,  it  means 
that  the  delay  of  the  critical  path  is  greater  than  the  target  clock  period  [21].  The 
critical  path  delay  is  the  largest  delay  between  flip-flops.  It  is  a  combination  of 
several  delays,  including:  clk-to-out  delay,  routing  delay,  setup  timing,  clock  skew, 
and  so  on  [21].  To  illustrate  this,  consider  the  example  of  an  optical  target  tracking 
system  that  is  required  to  track  twenty  thousand  targets  simultaneously.  Assuming 
that  the  system  takes  its  inputs  from  cameras  which  produce  30  frames  per  second; 
the  system  must  be  capable  of  performing: 


30  frames  x  20,000  targets  =  600,000  calculations / second 


Anything  less  than  this  and  the  system  will  fail  due  to  an  inability  to  meet  the  timing 
requirements.  If  the  system  is  running  on  an  FPGA  that  runs  at  a  maximum  speed 
of  40MHz  then  the  maximum  time  allowed  per  calculation  would  be: 


40,000,000  cycles / second 
600,000  calculations / second 


66.66  cycles / calculation 
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Any  design  that  is  unable  to  do  at  least  one  calculation  every  66.66  cycles  would  not 
meet  the  real-time  requirements. 

2.3  The  Kalman  Filter 

The  Kalman  Filter  is  a  means  to  predict  the  future  behavior  of  a  system  based 
on  past  behavior.  A  system’s  past  behavior  is,  in  a  way,  remembered  and  used 
along  with  measurements  to  make  the  predictions  of  how  the  system  might  behave 
in  the  future.  According  to  [13]  the  reason  that  tools  such  as  the  Kalman  Filter  are 
useful  to  a  designer  is  because  virtually  all  systems  are  non-deterministic.  In  other 
words,  few  if  any  systems  are  devoid  of  randomness  or  stochastic  behavior.  Whether 
a  system  inherently  contains  stochastic  processes  or  the  environment  that  may  act 
upon  a  system  is  itself  stochastically  governed  it  inevitably  is  non-deterministic  [13]. 
According  to  Maybeck  ’’When  considering  system  analysis  or  controller  design,  the 
engineer  has  at  his  disposal  a  wealth  of  knowledge  derived  from  deterministic  system 
and  control  theories.”  He  goes  on  to  say: 

There  are  three  basic  reasons  why  deterministic  systems  and  control  the¬ 
ories  do  not  provide  a  totally  sulRcient  means  of  performing  this  analysis 
and  design.  First  of  all,  no  mathematical  model  is  perfect. ..A  second 
shortcoming  of  deterministic  models  is  that  dynamic  systems  are  driven 
not  only  by  our  own  control  inputs,  but  also  by  disturbances  which  we  can 
neither  control  nor  model  deterministically... A  final  shortcoming  is  that 
sensors  do  not  provide  perfect  and  complete  data  about  a  system  [13]. 

It  is  naive  and  often  inadequate  to  assume  that  a  designer  can  have  perfect  control 
of  all  of  a  systems  parameters  as  well  as  the  environment  acting  upon  it  [13].  This 
is  why  the  Kalman  filter,  an  optimal  linear  estimator,  has  become  so  important  and 
widespread  in  the  technology  of  today.  The  Kalman  filter  takes  inaccurate,  incom¬ 
plete,  and  noisy  data  combined  with  environmental  disturbances  beyond  a  designers 
control  and  over  time  develops  an  optimal  estimate  of  desirable  quantities. 

2.3.1  Historical  Perspective.  Historically  the  Kalman  filter  owes  its  origins 
to  a  time  long  preceding  that  of  Rudolf  Emil  Kalman,  the  coinventor  of  the  Kalman 
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filter.  The  date  was  1  January  1801  when  an  astronomer  by  the  name  of  Giuseppe 
Piazzi  discovered  Ceres  (a  dwarf  planet  located  in  the  Solar  Systems  asteroid  belt). 
He  tracked  the  new  planet  for  several  days  before  illness  interrupted  his  observations. 
After  reporting  his  discoveries  other  astronomers  were  forced  to  wait  due  to  solar 
glare  before  attempting  themselves  to  find  Ceres  [10].  Attempts  to  locate  it  were 
unsuccessful  and  it  proved  too  difficult  for  them  to  predict  its  exact  position.  To 
locate  Ceres,  Carl  Friedrich  Gauss,  a  mere  24  years  old  at  the  time,  developed  a 
method  called  least-squares  analysis  and  successfully  predicted  the  position  of  Ceres 
using  this  method  [2].  The  method  of  least-squares  analysis  is  an  early  example  of 
an  estimator  that  estimates  the  state  of  a  dynamic  system  from  incomplete  and  noisy 
measurements  just  as  the  Kalman  filter  does.  However,  it  was  R.  E.  Kalman  who 
developed  and  proved  the  Kalman  filter  is  an  optimal  system-state  estimator  that 
minimizes  the  estimated  error  covariance  [21]. 

2.3.2  Applications  of  the  Kalman  Filter.  The  Kalman  filter  is  an  estimation 
tool  which  has  been  applied  over  a  wide  variety  of  disciplines.  Key  applications  of 
the  Kalman  filter  are  inertial  navigation,  sensor  calibration,  radar  tracking,  manufac¬ 
turing,  economics,  signal  processing,  freeway  traffic  modeling,  and  target  tracking  in 
general  [6]. 

2.4  Characterization  and  Implementation  of  the  Kalman  Filter 

This  section  discusses  the  Kalman  filter  equations  in  detail  and  then  relates  those 
equations  to  the  Matlab®  equations  used  as  the  basis  for  the  VHDL  implementation. 

2.4.1  Preliminary  Definitions. 

1.  a  priori:  knowledge  that  is  independent  of  experience. 

2.  a  posteriori:  knowledge  that  is  dependent  on  experience. 

The  Kalman  Filter  attempts  to  estimate  the  state  a:  G  of  a  discrete  time 
controlled  process,  where  x  is  the  state  which  is  contained  in  the  set  of  all  reals  and 
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is  of  dimension  n  [21],  The  dimension  could  be  the  x,  y  on  a  coordinate  plane  or  the 
number  of  variables  involved  in  describing  the  state  of  a  target  (e.g.  position,  and 
temperature) . 

2.4.2  Equations  and  Explanations.  The  Kalman  hlter  estimates  the  state  of 
a  discrete-time  controlled  process  governed  by  the  linear  stochastic  difference  equation 
2.1  [21], 

Xk  =  Axk-i  +  Buk-i  -\-  Wk-i  (2-1) 

In  Equation  2.1  the  n  x  n  matrix  A  relates  the  state  x  at  the  previous  time  step 
k  —  1  to  the  current  state  of  x.  That  is  to  say,  because  we  assume  a  linear  relationship 
between  Xk  and  Xk-i  the  relationship  can  be  dehned  as  Xk  =  Axk-i  in  the  absence  of 
a  driving  function  Bu^  and  process  noise  [21].  It  is  important  to  note  that  in  this 
thesis  it  is  assumed  that  A  remains  constant  even  though  it  is  possible  that  it  might 
change  with  each  time  step  [21]. 

In  equation  Equation  2.1  the  n  x  I  matrix  B  relates  the  control  input  u  to  the 
state  X  [21].  Eor  the  Kalman  hlter  used  in  this  thesis  there  are  no  control  inputs  so 
B  and  were  ignored. 

The  process  noise  is  any  internally  occurring  noise.  That  is,  no  system  is 
perfect  and  all  systems  suffer  from  internal  noise.  For  example,  if  a  Kalman  hlter  were 
to  be  made  to  predict  the  state  of  an  electronic  circuit  the  noise  might  be  voltage 
huctuations  inside  the  circuit  due  to  imperfections  caused  during  the  manufacturing 
process.  It  is  important  to  remember  that  for  the  Kalman  hlter  w  is  assumed  to  be 
white  noise  with  a  normal  probability  distribution. 

p{w)r^N{0,Q)  (2.2) 

If  these  assumptions  are  not  true  for  any  particular  system  then  the  Kalman  hlter 
becomes  less  than  optimal. 


The  system  measurement  equation  is 


Zk  =  Hxk  +  Vk  (2.3) 

where 

.2  G  E"*  (2.4) 

The  mx  n  matrix  H  in  measurement  equation  2.3  relates  the  state  x  to  the  measure¬ 
ment  Zk-  As  with  A  from  equation  2.1  H  is  assumed  to  be  constant  even  though  in 
practice  it  might  change  with  each  time  step  [21].  Any  sensor  hooked  up  to  a  system 
will  have  inherent  noise  included  in  its  signals.  Random  variable  Vk  represents  the 
measurement  noise  [21]. 

p{v)r^N{0,R)  (2.5) 

As  with  the  process  noise,  the  measurement  noise  is  assumed  to  be  white  with  a 
normal  probability  distribution. 

Equations  2.2  and  2.5  have  process  noise  covariance  Q  and  measurement  noise 
covariance  R  respectively. 

When  deriving  equations  for  the  Kalman  filter  the  goal  is  to  hnd  an  equation 
that  computes  an  a  posteriori  state  estimate  x^  as  a  linear  combination  of  an  a  priori 
estimate  aiji  as  shown  below  in  equation  2.6  [21]. 

Xk  =  xl  K{zk- Hx~^)  (2.6) 

Part  of  equation  2.6,  (zk  —  Hx^)  called  the  measurement  innovation,  or  the  residual. 
This  measures  the  difference  between  the  actual  measurement  Zk  and  the  predicted 
measurement  Hx^.  Any  part  of  an  equation  seen  with  a  ”hat”,  e.g.  x,  represents  an 
estimate. 

What  makes  the  Kalman  hlter  optimal  is  the  calculation  of  the  value  K.  This 
value  is  called  Kalman  gain  or  blending  factor.  This  was  derived  to  minimize  the 
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Time  Update 
(“Predict”) 


Measurement  Update 
(“Correct”) 


Figure  2.1:  Discrete  Kalman  filter  cycle.  The  time  update  projects  the  current  state 
estimate  ahead  in  time.  The  measurement  update  adjusts  the  projected  estimate  by 
an  actual  measurement  at  that  time  [21]. 

Table  2.1:  Discrete  Kalman  filter  time  update  equations 

x'^  =  Axk-i  +  Buk  (2.8) 

Pk  =  APk-iA^  +  Q  (2-9) 

a  posteriori  error  covariance  [21].  Detailed  explanation  or  derivation  of  K  is  beyond 
the  scope  of  this  thesis. 


Kk  =  P^H^{HP-H^  +  R)-^ 


(2.7) 


Where  is  the  a  priori  estimate  error  covariance  and  Pk  is  the  a  posteriori  estimate 
error  covariance  [21]. 

The  Kalman  filter  consists  of  two  stages  of  equations.  Figure  2.1  shows  the 
two  stages  commonly  called  the  time  update  or  predict  stage  and  the  measurement 
update  or  correct  stage.  The  time  update  equations  can  be  seen  in  Table  2.1  and  the 
measurement  update  equations  can  be  seen  in  Table  2.2  on  page  11 
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Table  2.2:  Discrete  Kalman  filter  measurement  update  equations 


K,  =  P^H^iHP^H^  +  R)-^ 

(210) 

Xk  =  x'^  +  K{zk  -  Hx^) 

(211) 

Pfc  =  (1  -  KkH)P^ 

(212) 

Figure  2.2:  A  complete  picture  of  the  operation  of  the  discrete  Kalman  filter  [21]. 
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CO  l> 


Listing  II.  1: 


for  k  =  2 : t _last 

x( : ,k)=  phi*x( : ,k-l) ; 
P ( :  , :  , k) =  phi*P  (  :  ,  :  ,  k 

-1) *phi  ’ 

+  Qd ; 

> 

II 

* 

* 

+  R ; 

K=P(: , : ,k)*H’*(inv(A( 

: ,k))) ; 

residual (:, k) =  z(:,k) 

-  H*x(: 

,  k)  ; 

x(:,k)=  x(:,k)  +  K*residual ( : 

,  k)  ; 

P( : , : ,k)=  P( : , : ,k)  - 

K*H*P( :  , 

:  ,  k)  ; 

sigma.f  (  :  , k) =sqrt (diag (P ( :  ,  :  , 

k)))  ; 

KK(: ,k)=K; 

end  "/o  End  time  loop 

2. 4-. 3  Kalman  Filter  Matlab®  Code.  The  code  seen  in  Listing  II.l  is  de¬ 
signed  to  produce  data  that  can  be  plotted  using  Matlab®  .  The  data  is  stored  in 
large  matrices;  this  was  done  solely  to  allow  for  the  generation  of  data  plots  and  is 
not  part  of  the  Kalman  filter  algorithm.  A  real-world  implementation  of  a  Kalman 
filter,  such  as  that  described  in  this  thesis,  does  not  have  to  store  information  beyond 
the  previous  estimation.  The  Matlab®  variable  x{:,k)  stores  k  column  vectors  for  the 
Kalman  filter  equations  variable  Xk  and  That  is,  the  estimate  x  for  both  the  time 
update  stage  as  well  as  the  measurement  update  stage. 

2.4-3. 1  Matlab®  Code  Association  with  Kalman  Filter  Equations. 

The  following  section  contains  a  comparison  and  association  of  the  Matlab®  code 
and  equations  for  the  Kalman  filter.  Table  2.3  on  page  13  shows  how  the  equations 
for  the  Kalman  filter  are  paired  up  with  those  from  the  Matlab®  code.  Note,  as  shown 
in  Listing  11.2,  the  Kalman  gain  K  is  calculated  in  two  separate  calculations. 


Listing  11.2: 


A ( : , k) = 

H*P(: , : ,k)*H> 

+  R ; 

II 

,k)*H’*(inv(A(; 

: ,k))) ; 

This  is  also  true  for  the  system  state  Xk  calculated  during  the  measurement 
update  stage(see  Listing  11.3). 
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Listing  11. 3: 

r esidual  (  :  , k) =  z(:,k)  -  H*x(:,k); 

x(:,k)=  x(:,k)  +  K* r es idual  (  :  , k)  ; 


Table  2.3:  Comparison:  Matlab®  code  and  Kalman  filter  equations 


=  Axk-i  +  Buk  x{:,  k)  =  phi  *  x{\^  k  —  1) 


=  APk-iA^  +  Q  P{:,  :,k)  =  phi  *  P{:, :,  fc  -  1)  *  phi'  +  Qd 


Kk  =  P^H^iHP^H^  +  R)-^  ^K  =  P(:, :,  k)  *  H' *  {inv{A{:,  k))) 


“  ^(b  k)  +  K  *  residual{:,  k) 


Pk  =  {l-  KkH)P^  ^  P(:, :,  k)  =  P(:,  :,k)  -  K  ^  H  ^  Pi:,:,k) 
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III.  Approach 


3. 1  Chapter  Overview 

This  chapter  contains  the  problem  definition,  the  goals  and  hypothesis,  and  a 
discussion  on  number  representation  format.  It  also  discusses  the  design  process 
used  to  develop  the  VHDL  Kalman  filter  implementations  including  behavioral,  and 
Register  Transfer  Level  (RTL)  models. 

3.2  Problem  Definition 

This  thesis  characterizes  a  set  of  real-time  implementation  requirements  of  a 
Kalman  filter  for  implementation  on  an  FPGA  and  discusses  various  means  to  opti¬ 
mize  the  implementation  via  scalable  architectures. 

3.2.1  Goals  and  Hypothesis.  The  primary  goal  of  this  design  is  to  maximize 
speed/throughput  as  much  as  possible.  As  a  secondary  goal,  minimization  of  area 
and  power,  will  also  be  considered. 

In  general,  the  requirement  for  a  digital  system  must  be  specified  and  categorized 
in  order  to  determine  hardware  requirements.  In  other  words,  system  requirements 
must  be  paired  up  with  appropriate  hardware  capabilities.  There  are  many  different 
FPGAs  with  varying  capabilities  for  a  designer  to  consider.  Which  FPGA  is  best 
suited  for  a  particular  design  depends  on  the  specific  design  specifications.  System 
specifications  can  include  many  different  requirements.  For  example,  power,  speed 
and  area  may  need  to  be  balanced  in  a  weighted  fashion  to  achieve  the  required 
results.  How  to  achieve  this  is  the  question.  Power  speed  and  area  are  all  intertwined, 
changing  one  will  inevitably  cause  changes  to  another.  For  example,  as  area  increases, 
often  times  power  requirements  will  also  increase  as  a  result  of  the  increased  number 
of  transistors  being  utilized.  If  a  low  power  design  is  required  it  may  be  necessary 
to  make  speed  concessions  that  may  also  result  in  a  design  requiring  less  area.  It 
is  considerations  such  as  these  that  determine  the  most  appropriate  combination  of 
implementation  and  hardware  to  satisfy  design  requirements. 
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The  traditional  VLSI  design  process  is  costly  both  in  time  and  money.  Because 
of  this  it  also  involves  a  tremendous  amount  of  risk. 

The  traditional  design  process  includes: 

•  Design 

•  Verihcation  and  testing 

•  Prototyping 

Each  of  these  steps  may  have  to  be  visited  several  times  as  the  process  is  iterative 
in  nature.  As  the  design  process  moves  forward  problems  can  force  designers  to 
backtrack  in  the  design  process  which  will  always  add  to  the  total  cost.  Tremendous 
man-hours  are  required  along  with  an  enormous  associated  cost. 

This  thesis  proposes  a  system  that  allows  a  designer  to  greatly  reduce  the  time 
needed  for  design,  verification,  and  testing;  as  such,  the  risk  factor  is  also  reduced 
tremendously.  In  terms  of  the  case  study,  specifications  for  a  design  requiring  or 
benefiting  from  the  use  of  a  Kalman  Filter  can  be  entered  into  the  system  and  an 
efficient,  and  optimized  hardware  description  suitable  for  implementation  on  an  FPGA 
or  Application-Specific  Integrated  Circuit  (ASIC)  is  automatically  generated. 

3.3  Number  Representation  Format 

With  respect  to  the  design  of  the  Kalman  filter  in  VHDL  it  is  necessary  to 
choose  a  method  of  representing  both  the  integer  and  fractional  portion  of  a  number. 
Floating  point  is  an  option  that  provides  a  large  dynamic  range  but  with  relatively 
poor  precision.  After  examination  of  the  Matlab®  code  it  was  decided  that  dynamic 
range  [14]  was  not  the  main  issue  and  that  an  alternative  to  floating  point  could 
prove  to  be  easier  for  implementation  purposes  as  well  as  provide  advantages  in  speed 
of  computation.  Fixed  point  was  chosen  due  to  the  precision  it  allows  for  and  the 
speed  benefits  it  affords.  If  an  application  can  be  done  in  fixed-point  arithmetic,  it 
will  probably  run  faster  than  in  any  other  format  because  fixed  point  is  the  natural 
language  of  the  processor  [14] . 
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Traditionally  the  choice  would  have  to  be  made  as  to  where  to  place  the  radix; 
how  many  bits  to  the  right  and  left  of  the  radix  would  be  required.  The  approach  of 
this  thesis  to  design  allows  a  designer  to  make  this  kind  of  decision  at  the  end  of  the 
design  process  after  testing  gives  further  insight  into  the  most  appropriate  format. 
Using  Java  code,  abstraction  of  the  Kalman  filter  internal  parameters  is  performed, 
thus  allowing  those  abstracted  parameters  to  be  defined  by  the  designer  whenever 
most  convenient  or  practical.  For  initial  testing  purposes  all  numbers  are  represented 
by  a  32-bit  fixed-point  binary  number,  and  for  secondary  testing  a  64-bit  fixed-point 
binary  representation  is  used.  The  32-bit  fixed-point  representation  uses  15  bits  for 
the  integer  portion,  16  bits  for  the  fractional  portion  and  one  bit  for  the  sign.  The 
15-bit  integer  can  display  numbers  as  large  as  32,767  and  as  small  as  -32,768.  The 
fractional  16  bits  can  display  a  fraction  as  small  as  1.52587890625  x  10“®  or  as  large 
as  9.999847412109375  x  10“h  The  64-bit  fixed-point  representation  uses  31  bits  for 
the  integer  portion,  32  bits  for  the  fractional  portion  and  one  bit  for  the  sign.  The 
31-bit  integer  can  display  numbers  as  large  as  2^^  or  2, 147,483,647  and  as  small  as 
_232  ^  231  or  —2, 147, 483,  648.  The  fractional  32  bits  can  display  a  fraction  as  small 
as  2.32830643654  x  10-i°  or  as  large  as  9.999999997671694  x  lO"!.  Table  3.3  on 
page  16  shows  the  format  used  for  number  representation.  In  the  case  of  the  32-bit 
representation,  bits  0  through  15  are  used  to  represent  the  fractional  portion  of  the 
number,  bits  16  through  30  represent  the  integer,  and  bit  31  is  the  sign  bit. 


Table  3.1:  Number  representation  formats 


bits: 


bits: 


32-Bit  Representation 


Sign 

integer 

fraction 

31 

30  down  to  1 6 

1 5  down  to  0 

64-Bit  Representation 

sign 

integer 

fraction 

63 

62  down  to  32 

31  down  to  0 
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If  additional  bits  were  available  for  the  fraction,  the  more  accurately  a  decimal 
fraction  can  be  represented  in  binary.  Different  applications  may  require  different 
levels  of  accuracy.  This  research  provides  a  means  for  a  designer  to  implement  a 
Kalman  Filter  that  has  been  optimized  for  speed,  power,  and  area  that  also  maintains 
some  minimum  level  of  accuracy  as  specified  by  the  designer. 

3.4  Design  Optimization 

Design  optimization  can  be  accomplished  in  several  ways  depending  on  what 
type  of  optimization  is  required.  For  the  Kalman  filter  design  described  in  this  thesis, 
optimization  for  speed  is  most  critical.  Parallelization  and  pipelining  are  two  methods 
used  to  help  create  a  hardware  design  that  fulfills  this  requirement.  This  section 
discusses  various  optimizations  and  their  interrelationship.  These  optimizations  or 
means  of  optimization  are  discussed  in  order  of  priority  as  they  relate  to  this  thesis. 

3.4-1  Optimization  by  Parallelization.  Parallelization  in  hardware  is  the 
simultaneous  processing  of  data.  In  modern  day  processors,  such  as  the  Pentium  line 
of  processors,  all  instructions  given  to  the  computer  are  processed  in  series  or  one-at- 
a-time.  This  means  that  even  when  performing  calculations  that  lend  themselves  well 
to  parallelization  they  cannot  take  advantage  of  this.  This  is  one  reason  why  custom 
designs  such  as  the  one  described  in  this  thesis  can  be  much  faster,  and  much  more 
efficient  than  a  multipurpose  microprocessor.  The  custom  design  allows  the  designer 
to  parallelize  or  simultaneously  process  data  to  the  full  extent  allowed  by  an  FPGA 
or  ASIC  of  a  given  capacity.  Figure  3.4.1  on  page  18  shows  a  small  portion  of  the 
Kalman  filter  algorithm  flow  where  MMult  is  matrix  multiplication,  x~  is  the  a  priori 
system  state,  x'^  is  the  a  posteriori  system  state,  P~  is  the  a  priori  estimate  error 
covariance,  is  the  a  posteriori  estimate  error  covariance,  Qd  is  the  process  noise 
covariance,  H  relates  the  state  x  to  the  measurement  and  phi  relates  the  state 
X  at  the  previous  time  step  k  —  1  to  the  current  state  of  x.  Observe  how  multiple 
arithmetic  operations  can  be  performed  simultaneously  or  in  parallel  throughout  the 
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Figure  3.1:  This  figure  shows  a  portion  of  the  Kalman  filter  algorithm  in  flow  chart 
form.  Parallelization  can  be  observed  in  each  cycle;  multiplications  are  performed 
simultaneously.  Data  dependency  can  also  be  observed;  Operation-l  must  complete 
before  Operation_2  can  begin  due  to  its  dependency  on  x{:,k). 

algorithm.  It  also  demonstrates  data  dependencies;  Operation.!  must  complete  before 
Operation_2  can  begin  due  to  its  dependency  on  x{:,k)  being  completed  prior. 

3.4-2  Optimization  by  Pipelining.  Pipelining  is  a  common  method  designers 
have  used  to  speed  up  sequential  processing.  Sequential  processing,  meaning  the  need 
to  run  data  in  order,  is  unfortunately  required  by  some  algorithms.  Pipelining  is  a 
method  commonly  used  to  improve  throughput  of  a  system.  Pipelining  creates  se¬ 
quential  processing  stages  that  once  completed  allow  the  current  data  to  be  forwarded 
to  the  next  stage  and  new  data  to  be  brought  in  for  processing.  Pipelining,  however, 
is  not  always  possible  at  a  particular  level  of  circuit  abstraction  but  may  be  possible 
at  a  lower  or  higher  level.  If  a  particular  operation  requires  a  computation  such  as 
a  multiply  to  be  performed  multiple  times  then  that  multiply  can  be  pipelined.  It 
is  important  to  remember  that  pipelining  improves  throughput,  not  response  time. 
Throughput  is  the  total  amount  of  work  done  in  a  given  time  and  response  time  or 
execution  time  is  the  time  between  the  start  and  completion  of  a  task  [9]. 
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3.4-3  Optimization  for  Speed.  Optimization  for  speed  goes  hand-in-hand 
with  the  aforementioned  optimization  methods.  However,  depending  on  the  type 
of  algorithm  being  implemented,  pipelining  may  or  may  not  improve  performance. 
If  response  time  is  to  be  optimized  then  a  deep  pipeline  may  actually  hinder.  For 
example,  if  an  algorithm  is  heavily  sequential,  that  is  to  say  order  of  computation 
cannot  be  deviated  from,  pipelining  may  actually  hinder  performance.  As  a  general 
rule  of  thumb,  faster  circuits  will  require  more  parallelism,  which  in  turn  increases 
area.  This  is  the  trad-off  between  speed  and  area:  a  faster  circuit  will  require  more 
parallelism  and  therefore  suffer  an  increase  in  area  [12].  The  symbiotic  existence 
between  speed  and  parallelism  unfortunately  does  not  usually  exist  between  speed 
and  area. 

3.4.4  Optimization  of  Area.  Optimization  of  area  may  be  accomplished 
in  several  ways.  Resource  sharing  being  the  method  that  will  be  focused  on  in  this 
thesis.  Resource  sharing  is  the  use  of  a  resource  by  multiple  processes.  For  example, 
if  two  distinct  processes  both  require  a  multiply  of  the  same  dimension  then  they  can, 
at  different  times,  use  the  same  multiplier.  That  is  to  say  that  a  resource  sharing 
design  may  suffer  the  penalty  of  diminished  throughput  unless  the  operations  are 
mutually  exclusive  [12].  Although  resource  sharing  allows  for  the  reuse  of  hardware 
it  typically  will  also  require  a  more  complex  control  system.  Resource  sharing  is  only 
useful  if  the  increased  control  complexity  results  in  a  smaller  area  increase  than  if 
the  resource  were  simply  cloned;  this  is  usually  the  case.  Some  synthesis  tools  that 
support  resource  sharing  can  automatically  optimize  a  design  by  allowing  mutually 
exclusive  operations  to  share  resources.  However,  there  is  no  guarantee  that  a  tool 
will  do  so,  therefore,  to  guarantee  resource  sharing  it  is  part  of  the  RTL  design  for  this 
thesis;  this  allows  for  more  flexibility  when  choosing  a  synthesis  tool.  As  mentioned 
above,  as  a  general  rule  of  thumb,  faster  circuits  require  more  area. 
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Table  3.2:  Matrix  position  designators 


i 

j 

k 

1 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

1 

0 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

3.5  Behavioral  Model  of  the  Kalman  Filter 

One  of  the  first  steps  towards  a  synthesizable  design  is  to  build  a  behavioral 
model  to  help  determine  correct  function  of  the  Kalman  filter  in  VHDL.  A  full  RTL 
model  is  difficult  and  time  consuming  to  design  and  therefore  verification  of  upper- 
level  behavior  is  most  quickly  and  effectively  done  behaviorally. 

3. 5. 1  Matrix  Multiplication.  Efficiently  multiplying  matrices  is  of  paramount 
importance  for  any  Kalman  Filter  implementation.  The  vast  majority  of  mathemat¬ 
ical  operations  are  performed  on  matrices.  Determination  of  an  efficient  method  for 
muftiplying  matrices  in  VHDL  for  behavioraf  coding  is  presented  next.  The  first  step 
was  to  buifd  a  table  that  shows  the  positions  of  the  individual  parts  of  the  matrices 
that  are  multiplied. 


o 

o 

^01 

X 

Boo 

Boi 

Coo  Coi 

^10 

All  _ 

Bio 

Bn 

_  Cio  Cn  _ 

Table  3.2  is  used  to  help  determine  how  to  set  up  the  nested-for- loops  such  as 
those  seen  in  Listing  Ill.l.  The  position  variables  i,  j,  k,  and  I  represent  the  matrix 
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C*00  =  ^00  X  -^00  +  ^ 
Cqi  =  Aqo  X  Bqi  +  A 
C\Q  =  Aiq  X  Bqq  +  A 
C\\  =  Aiq  X  Bq\  +  ^ 


01  X  Biq 

(3.2) 

01  X  Bii 

(3.3) 

11  X  Biq 

(3.4) 

11  X  Bii 

(3.5) 

position  designators;  e.g.  ^oo  ^  Aj  where  z  =  0  and  j  =  0  and  5oo  Bki  where 
k  =  0  and  I  =  0. 


Listing  III.l: 

1 

--******************************* 

2 

Function  to  multiply  two  2x2  matrices  * 

3 

--*************************** 

4 

Function  mat r ix_mult _2x2  (A,B:  matrix_2x2) 

5 

Return  matrix_2x2  Is 

6 

Variable  result  :  matrix_2x2 ; 

7 

Variable  func_templ  :  Signed(63  Downto  0)  := 

8 

(OTHERS  =>  ’O’) ; 

9 

Begin--Begin  function  code . 

10 

For  i  In  1  to  2  Loop 

11 

For  L  In  1  to  2  Loop 

12 

For  j  In  1  to  2  Loop 

13 

func_templ  :=  ( A (i , j ) *B  (  j  ,  L) )  +  func_templ; 

14 

End  Loop  ; 

15 

result (i,L)  :=  func_templ  (47  Downto  16); 

16 

func_templ  := 

17 

(OTHERS  =>  ’O’) ; 

18 

End  Loop  ; 

19 

End  Loop ; 

20 

Return  result ; 

21 

End  mat r ix_mult _2x2 ; 

If  matrices  A  and  B  are  traversed  left  to  right  from  equation  3.2  through 
equation  3.5  and  the  position  values  recorded  such  as  in  Table  3.2  on  page  20  this 


21 


1 


Listing  III. 2: 

func_templ  :=  (A (i , j ) *B ( j , L) )  +  f unc_templ ; 


information  can  be  used  to  determine  the  for- loops.  Table  3.2  shows  that  for- loop 
variables  j  and  k  are  identical  and  therefore  one  of  them  can  be  left  out  leaving  three 
nested  for- loops,  one  nested  inside  the  other.  Determining  the  order  of  the  for- loop 
variables  is  a  simple  process.  Assuming  that  column  k  is  chosen  to  be  left  out  since  it 
is  identical  to  column  j  then  the  for-loop  variable  i  and  j  are  associated  with  A  and 
j  and  L  are  associated  with  B.  This  determines  the  order  of  the  for-loop  variables  in 
the  code  segment  seen  in  Listing  III. 2. 

3.5.2  VHDL  Types.  Numeric_std  is  the  type  used,  as  signed  numbers  are  re¬ 
quired.  Numeric_std  is  the  standard  VHDL  synthesis  package  along  with  numeric_bit. 
Numeric_std  is  defined  as  unconstrained  arrays  of  stddogic  elements: 


1 

2 


Listing  III. 3: 

Type  unsigned  is  array  (natural  range  <>)  of  std_logic; 
Type  signed  is  array  (natural  range  <>)  of  std_logic ; 


Using  the  numeric_std  type  is  not  always  required,  as  many  VHDL  compilers/synthe- 
sizers  can  synthesize  other  types  such  as  stddogic.  However,  use  of  numeric_std  helps 
to  ensure  synthesis. 

3.5.3  Project  Code.  This  section  presents  a  small  fraction  of  the  behavioral 
VHDL  representation  of  the  Kalman  filter. 

3.5.3. 1  Behavioral  Model.  The  behavioral  model  uses  32-bit  fixed- 
point  binary  numbers.  The  numbers  are  broken  up  into  a  16-bit  integer  portion  and 
a  16-bit  fractional  portion.  The  goal  is  to  achieve  results  that  at  least  approximate 
those  found  using  the  Kalman  filter  Mat  lab®  simulation. 
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Table  3.3:  Constant /variable  approximations 


Constant  /Variable 

Matlab®  Representation 

Fixed-Point  Representation 

dt 

0.1000 

0.099908447265625 

R 

10.0000 

10.0000 

G 

/  0.0000  \ 

V  1.0000  J 

/  0.0000  \ 

V  1.0000  J 

B 

/  0.0000  \ 

V  1.0000  J 

/  0.0000  \ 

V  1.0000  J 

Bd 

/  0.0050  \ 

V  0.1000  J 

(  0.0049896240234375  \ 
0.099908447265625  ) 

H 

(  1.0000  0.0000  ) 

(  1.0000  0.0000  ) 

F 

/  0.0000  1.0000  \ 
0.0000  0.0000  J 

/  0.0000  1.0000  \ 

0.0000  0.0000  ) 

phi 

f  1.0000  0.1000  \ 

0.0000  1.0000  J 

[  1.0000  0.099908447265625  \ 

V  0.0000  1.0000  J 

Qd 

f  0.0333  0.5000  \ 

0.5000  10.0000  J 

[  0.033294677734375  0.5000  \ 

V  0.5000  10.0000  ) 

x(l) 

o  o 
o  o 
o  o 
o  o 

o  o 

o  o 

o  o 

o  o 

p(i) 

f  0.2500  0.0000  \ 
0.0000  0.2500  J 

f  0.2500  0.0000  \ 

0.0000  0.2500  ) 

Due  to  the  limitations  of  having  a  fixed  number  of  bits  to  represent  the  fractional 
portion  of  the  numbers  for  the  behavioral  model,  the  results  are  not  exact  duplicates 
of  the  Matlab®  simulation  results.  However,  the  results  are  very  close  approximations. 
There  are  several  constants  and  variables  that  must  be  set  before  simulation  of  the 
VHDL  Kalman  Filter.  These  constants  and  variable  initializations  are  taken  from  the 
Matlab®  Kalman  filter  code.  To  see  the  fixed  point  representations  of  these  constants 
and  variable  initializations  see  3.3. 


The  behavioral  model  consists  of  11  mathematical  functions  including  various 
matrix  manipulation  algorithms.  It  also  consists  of  various  constants  which  are  pre¬ 
defined  in  the  Matlab®  code.  Here  are  three  examples  of  behavioral  code  used  for 
matrix  manipulation: 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 


Listing  III. 4: 


Function  to  add  two  2x2  matrices  * 

Function  mat r ix_add_2x2  (A,B:  matrix_2x2) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

result(i,j)  :=  A ( i , j ) +B ( i , j ) ; 
End  Loop  ; 

End  Loop  ; 

Return  result ; 

End  mat r ix_add_2x2 ; 

Function  to  add  a  scalar  to  a  2x2  matrix  * 

Function  matr ix_add_int _2x2  (A:  matrix_2x2  ; 

B:  Signed(31  Downto  0)) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

result(i,j)  :=  A(i,j)+B; 

End  Loop  ; 

End  Loop  ; 

Return  result ; 

End  matr ix_add_ int _2x2 ; 

Function  to  multiply  a  2x2  with  a  2x1  matrix  * 
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35 

Function  matrix 

_mult  _2x2_2 

xl 

(A :  mat r ix_ 

2x2 

3 

36 

B :  matr ix_2 

xl) 

37 

Return 

mat r ix_2x 1 

Is 

38 

Var iabl 

e  result  : 

mat 

rix_2xl ; 

39 

Var iabl 

e  func_temp 

1  : 

Signed  (63 

Downto  0)  := 

40 

(OTHERS 

0 

A 

II 

41 

Begin-- 

Begin  funct 

ion 

code  . 

42 

For  i  I 

n  1  to  2  Lo 

op 

43 

For  j  In  1 

to 

2  Loop 

44 

f  unc_ 

tempi  :=  (A 

(i  , 

j ) *B  (  j  )  ) 

45 

+ 

fun 

c_templ ; 

46 

End  Loop  ; 

47 

result  (  i ) 

:  = 

func_templ (47 

Downto  16) 

48 

f unc_t emp 1 

:  = 

49 

(OTHERS  => 

^0 

50 

End  Loop  ; 

51 

Return 

result  ; 

52 

End  matrix_mult 

_2x2_2xl  ; 

Figure  3.2  on  page  26  shows  the  output  for  a  test  ran  on  the  behavioral  Kalman 
filter  with  inputs  generated  by  the  Matlab®  Kalman  filter  code.  All  test  inputs 
into  the  behavioral  VHDL  simulation  resulted  in  outputs  that  closely  approximate 
outputs  generated  by  the  Matlab®  Kalman  filter  code.  Table  3.4  on  page  26  shows 
the  Matlab®  inputs  and  outputs  along  side  their  respective  VHDL  model  inputs  and 
outputs.  The  VHDL  model  inputs  are  different  due  to  the  translation  from  decimal  to 
fixed-point  binary.  The  results  from  the  VHDL  model  are  very  close  approximations 
to  those  calculated  in  Matlab®  .  This  small  difference  can  be  attributed  to  the 
difference  in  the  number  of  bits  used  to  represent  values  inside  each  model. 

3.5.4  The  Reciprocal  Function.  It  is  necessary  to  calculate  the  reciprocal 
(1/A)  as  part  the  calculation  of  K  which  is  the  Kalman  gain  or  blending  factor  [21], 
see  Listing  HI. 5.  Note,  A  is  not  a  fundamental  part  of  the  Kalman  Filter;  it  exists 
only  as  a  sub-calculation  of  K. 
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Figure  3.2:  Behavioral  Model  Simulation  For  Kalman  Filter 


Table  3.4:  Test  results  for  behavioral  model  (outputs  are  for  position) 


Mat  lab®  Input 

Mat  lab®  Output 

Model  Input 

Model  Output 

1.9934 

1.1248 

1.9933929 

1.1247863 

7.0719 

1.5157 

7.0718994 

1.5156707 

4.0100 

1.9602 

4.0099945 

1.9602203 

0.8519 

2.0217 

0.8518981 

2.0217437 

1 


Listing  III. 5: 

K=P(: , : ,k)*H’*(inv(A(: ,k))) ; 


A  problem  arises  when  performing  division  in  binary.  Without  properly  shifting 
the  decimal  place  to  the  left,  the  answer  will  not  be  correct.  To  achieve  the  correct 
answer  the  dividend  must  be  left-shifted  by  the  number  of  bits  that  one  wishes  to 
have  to  the  right  of  the  radix  point.  To  see  this  in  mathematical  terms  please  refer 
to  equation  3.6. 


X  X  2^^ 
y  X  2^® 


^  X  2 


16 


The  following  example  also  illustrates  this. 


(3.6) 


1 

8 


0.125 


(3.7) 


OI.OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO2 

OOOOOOOOOOOOIOOO.OOOOOOOOOOOOOOOO2 


OOOOOOOOOOOOOOOO.OOIOOOOOOOOOOOOO2 

(3.8) 
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In  equation  3.8  the  dividend  is  left-shifted  by  32  places  and  the  divisor  is  left- 
shifted  by  16  places.  This  results  in  an  answer  with  a  16-bit  fractional  portion. 
The  location  of  the  radix  is  actually  only  virtual.  The  division  is  done  without  the 
computer  having  any  sense  of  where  the  radix  point  is  actually  located. 

3.6  Top  Level  Schematic 

The  top  level  schematic  of  the  design  can  be  seen  in  figure  3.6.  This  section 
breaks  down  the  design  process  into  its  constituent  pieces  and  discusses  each  of  those 
pieces. 

3.6.1  The  Kalman  Filter  Equations.  First,  a  close  look  at  the  Matlab® 
equations  that  describe  the  Kalman  filter. 

A  close  examination  of  the  Matlab®  equations  made  it  apparent  that  at  least 
this  particular  implementation  of  the  Kalman  filter  is  highly  order  dependent.  That  is, 
a  majority  of  steps,  i.e.  lines  one  through  eight  in  Figure  3.3  on  page  29,  are  dependent 
on  previous  steps.  For  example,  in  the  following  listing  lines  two  and  three  cannot 
start  until  line  one  has  finished.  Although  it  is  true  that  the  majority  of  the  Kalman 


1 

2 

3 


Listing  III. 6: 

P(:,:,k)=  phi*P ( : , : , k-1) *phi ’  +  Qd ; 
A(: ,k)=  H*P(: , : ,k)*H’  +  R; 

K  =  P(:  , :  ,k)*H’*( inv  (A  ( :  , k) ) ) ; 


filter  algorithm  requires  sequential  processing  there  is  room  to  process  portions  of  the 
code  out  of  order.  The  algorithm  is  broken  up  into  its  constituent  operations.  These 
operations  include:  addition,  subtraction,  multiplication,  and  reciprocal  function. 
In  Figure  3.3  on  page  29  each  calculation  cycle  involves  at  least  one  constituent 
operation.  For  example,  calculation  cycle  1,  which  encompasses  calculation  numbers 
1  and  2,  each  perform  one  multiply.  Because  the  two  multiplies  in  calculation  cycle 
1  are  independent,  in  the  sense  that  the  input  of  one  does  not  depend  on  the  output 
of  the  other,  they  can  be  calculated  at  the  same  time.  Figure  Figure  3.3  on  page  29 
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Listing  III. 7: 


1 

2 

3 

4 

5 


x(:,k)=  phi *x ( : , k-1) ; 

P(:,:,k)=  phi*P  (  :  , :  , k-1) *phi ’  +  Qd ; 
A(: ,k)=  H*P(: , : ,k)*H>  +  R; 

K=P(: , : ,k)*H>*(inv(A(: ,k))) ; 
residual (:, k) =  z(:,k)  -  H*x(:,k); 

x(:,k)=  x(:,k)  +  K*residual ( : , k) ; 

P(:,:,k)=  P(:,:,k)  -  K*H*P ( :  ,  :  , k) ; 


shows  the  order  of  calculation  for  all  constituent  calculations  in  the  Kalman  hlter 
algorithm.  This  can  also  be  seen  in  flow-chart  form  in  Figure  3.5  on  page  34.  It  had 
to  be  decided  how  many  calculations  to  allow  in  parallel  at  any  one  time.  The  more 
parallel  calculations  that  can  be  done  the  fewer  clock  cycles  overall  that  would  be 
required  to  complete  one  iteration  of  the  Kalman  Alter.  Hardware  area  becomes  a 
concern  as  any  parallel  calculation  will  require  additional  area  unless  the  calculations 
are  of  a  different  nature.  That  is,  i.e.  if  two  identical  multiplications  are  to  be  done 
in  parallel  then  double  the  hardware  is  required  verses  if  the  multiplications  were  to 
be  done  sequentially.  In  contrast,  if  an  addition  and  multiply  are  required  for  any 
particular  design  then  there  is  not  an  area  penalty  for  allowing  the  calculations  to 
occur  in  parallel  except  for  possibly  controller  overhead.  In  general,  if  two  calculations 
require  different  hardware  then  there  are  not  any  penalties  for  performing  them  in 
parallel  except  for  possible  controller  overhead. 

3.6. 1.1  Development  of  Figure  3.3.  Figure  3.3  on  page  29  was  used  to 
develop  the  final  RTL  model;  it  stems  from  an  examination  of  the  equations  seen  in 
Listing  III. 7. 
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The  following  references  to  equation  number(s)  (EN)  refer  to  the  line  numbers  in 
Listing  III.  7.  Already  completed  calculations  or  calculations  completed  in  a  previous 
step  are  designated  by  surrounding  them  with  square  brackets  [  ] . 

Equation 

phi  *  x{:,k  —  1) 

EN  1,  consists  of  one  multiply  and  can  therefore  be  completed  in  one  calculation 
cycle.  It  is  designated  to  be  started  and  completed  in  calculation  cycle  1.  The  next 
equation 

phi  *  P(:, A:  —  1)  *  phi'  +  Qd 

EN  2,  contains  three  constituent  calculations  and  therefore  requires  three  calculation 
cycles  to  complete.  It  was  designated  to  be  started  in  calculation  cycle  1.  Calculation 
cycle  1  is 

phi  *  x{:,k  —  1) 
phi  *  P{:,  :,k  —  1) 

Due  to  data  hazards,  at  most  two  parallel  calculations  are  performed  at  any  one  time. 

Because  EN  3  and  EN  4  require  EN  2  to  be  completed  before  they  can  start,  cal¬ 
culation  cycle  2  continues  the  calculation  of  EN  2  as  well  as  beginning  the  calculations 
for  EN  5.  Calculation  cycle  2  is 

[phi  *  P{:, A;  —  1)]  *  phi' 

H*x{:,k) 

Calculation  cycle  3  finishes  calculating  both  EN  2  as  well  as  EN  5.  Calculation 
cycle  3  is 

P(:, :,  k)  =  [phi  *  P(:, A:  —  1)  *  phi']  +  Qd 
residual{:,  k)  =  z{\^  k)  —  [H  ^  x(:,  k)] 
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Calculation  cycle  4  begins  the  calculation  of  EN  3  and  therefore  EN  4  as  well. 
This  is  due  to  the  fact  that  EN  3  is  a  sub-calculation  of  EN  4.  Calculation  cycle  4  is 

Calculation  cycle  5  consists  of  only  a  single  calculation;  no  parallel  calculations 
could  be  performed  at  this  stage.  Calculation  of  EN  3  is  continued.  Calculation  cycle 
5  is 

[H  *P{:,:,k)]*H' 

Calculation  cycle  6  is  a  special  case  requiring  extra  clock  cycles  to  accommodate 
the  reciprocal  calculation.  In  this  calculation  cycle,  the  special  case  of  performing 
two  constituent  calculations  in  series  is  addressed.  First,  the  following  calculation  is 
performed 

[H*P{:,:,k)*H']  +  R 

then  immediately  following  the  completion  of  this  calculation  the  reciprocal  or  inverse 
is  calculated. 

inv{[H*P{:,:,k)*H'  +  R]) 

As  with  calculation  cycle  5,  calculation  cycle  7  consists  of  only  a  single  calcula¬ 
tion.  Calculation  of  the  EN  4  is  finished.  Calculation  cycle  7  is 

K  =  [P(:, :,  k)  *  H']  *  [(mn(A(:,  k)))] 

Calculation  cycle  8  begins  calculation  of  EN  6  and  EN  7  or  the  measurement 
update  stage,  see  Figure  2.2  on  page  11.  Calculation  cycle  8  is 

[K]  *  [residual{:,  A:)] 
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[K]^[H^P{:,:,k)] 


Calculation  cycle  9  completes  calculation  of  EN  6  and  EN  7  and  also  completes 
one  iteration  of  the  Kalman  filter  algorithm.  Calculation  cycle  9  is 

x{:,  k)  =  [a:(:,  k)]  +  [K  *  residual{:,  k)] 

P{:,:,k)  =  [P{:,:,k)]-[K*H^P{:,:,k)] 

Figure  3.4  on  page  30  shows  the  timing  for  the  start  and  finish  of  each  cal¬ 
culation.  Note  that  the  calculation  for  A  from  Listing  III. 7  is  actually  just  a  sub¬ 
calculation  of  K  and  therefore  does  not  appear  in  3.4.  Also,  K  appears  twice  in 
Listing  IIL7  because  portions  of  K  are  calculated  simultaneously;  this  occurs  in  cal¬ 
culation  cycle  4. 

3.6.2  The  Controller.  The  controller  is  the  brains  of  the  entire  system.  It 
coordinates  the  various  other  components  to  register  their  values  during  certain  clock 
cycles.  Among  other  things,  the  controller’s  job  is  to  control  the  flow  of  data  into 
and  out  of  the  memory  unit.  The  controller  is  a  one-hot  encoded  state  machine  with 
nine  states,  si  through  s9  (OOOOOOOOI2  through  IOOOOOOOOO2).  State  changes  occur 
based  on  the  value  of  an  internal  counter  that  increments  and  resets  based  on  the 
clock  and  conditions  within  the  states.  Listing  III. 8  shows  a  portion  of  the  VHDL 
code  including  state  si. 

Listing  III. 8: 

1 

2 

3 

4 

5 

6 


Architecture  behav  Of  controller  Is  --This  controller  will 
use  one-hot  encoding 

Type  state_type  Is  (si,  s2 ,  s3 ,  s4  ,  s5  ,  s6  ,  s7  ,  s8  ,  s9); 

Attribute  enum_encoding :  string; 

Attribute  enum_encoding  of  state_type :  type  is 
"000000001  000000010  000000100  000001000  000010000  ... 
000100000  001000000  010000000  100000000"; 
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Figure  3.5:  Kalman  filter  algorithm  fiowchart. 
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Figure  3.6:  Top  Level  diagram  of  the  Kalman  filter  VHDL  model. 


7 

8 
9 

10 

11 

12 

13 

14 


Signal  CS ,  NS  :  state_type; 

Signal  counter_out  :  signed(4  Downto  0) ; 

Begin 

.***************** Begin  comb_proc  Process... 


15 

comb_ 

pr  oc 

:  Process 

(elk  , 

reset ) 

16 

17 

Variable 

counter  : 

signe 

d (4  Downt ' 

18 

19 

Begin 

20 

21 

If 

(res 

et  =  MO 

Then 

22 

counter  :=  " 

00000" 

) 

23 

mem 

.control 

< 

=  "0000"; 

24 

ALU 

< 

=  "0011"; 

25 

ALU 

_2 

< 

=  "0011"; 

:=  "00000"; 
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26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 


mux_ 1  <=  ’  1  ’  ; 

mux_2  <=  ^ 1 ^  ; 

r  ecipr  ocal_r  eset  <= 

reciprocal_load  <=  ^0^; 

r eciprocal_mux_contr ol  <=  ^0^; 

CS  <=  SI; 

NS  <=  SI; 


Elsif  (elk  ^  event  And  elk  = 
CS  <=  NS; 

eounter_out  <=  eounter; 
eounter  :=  eounter  +  1; 

Case  CS  Is 

When  SI  => 


lOThen 


mem_eont r ol  <= 

"0000" 

ALU_1  <= 

"0011 " 

ALU_2  <= 

"0011 " 

mux_l  <= 

mux_2  <= 

reeiproeal_reset 

<= 

reeiproeal_load 

<= 

^0  ^ 

reeipr oeal_mux_eontr ol  <= 

^0  ^  ; 

If  (eounter  =  "00010 

")  Then 

output _reg_load 

<=  M  ' 

} 

Elsif  (eounter  =  "00011") 

NS  <=  S2; 

Then 

output _reg_load 

A 

II 

O 

) 

Elsif  (eounter  =  "00100")  Then 
mem_eontr ol  <=  "1001"; 

ALU_1  <=  "0011"; 
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61 

ALU_2  <= 

"0011  " 

} 

62 

mux_l  <= 

^1^ 

63 

mux_2  <= 

64 

reciprocal_reset 

<= 

’1’; 

65 

reciprocal_load 

<= 

’O’; 

66 

r ecipr ocal_mux_contr ol  <= 

’O’; 

67 

68 

counter  :=  "00000"; 

69 

70 

End  If; 

71 

72 

When  S2  =  >  .  .  . 

3.6.3  Memory  Unit.  The  memory  unit  is  a  combination  RAM  and  ROM 
that  allows  for  both  hard-coded  constants  and  writable  variables  to  exist  inside  a 
single  unit.  The  memory  uses  a  four  bit  address  to  access  data  in  chunks  of  four. 
That  is,  each  address  effectively  is  associated  with  four  data  that  are  simultaneously 
output  on  four  separate  buses.  Each  bus  consists  of  four  separate  sub-buses  that  each 
make  up  one  of  four  components  of  a  matrix.  The  way  that  this  memory  associates 
an  address  with  the  data  that  is  to  be  output  makes  it  a  unique  and  highly  custom 
memory.  The  memory  has  an  asynchronous  reset  that  returns  all  memory  locations 
to  a  default  starting  value.  Every  number  is  stored  in  the  form  of  a  2  x  2  matrix 
inside  the  memory. 

Variables  (matrix  variables)  are  stored  in  six  separate  locations  with  each  loca¬ 
tion  (registers)  composed  of  four  numbers.  These  location  names  and  their  associated 
stored  variables  can  be  seen  in  Table  3.5  on  page  38. 

Constants  are  also  stored  in  the  form  of  a  2  x  2  matrix  inside  the  memory. 
These  values  represent  various  parameters  of  the  Kalman  hlter  and  can  be  set  by  the 
designer  prior  to  compilation  and  synthesis.  See  Listing  III.  10  for  an  example  of  the 
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Table  3.5:  Memory  locations  and  their  associated  stored  variable. 


reg_0 
reg_l 
reg_2 
reg_3 
reg_4 
reg 5 


X 

P 

residual 
H  xP 
K 

KxHxP 


signal  assignments  for  a  32-bit  example.  See  Listing  III. 9  for  the  input  and  output 
ports.  Also,  see  Listing  III.ll  for  output  assignments. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Listing  IIL9: 


Entity  mem  Is 

Generic (high_bit 

natural 

:=  31) 

) 

Port ( elk  ,  reset 

In  std_ 

logic ; 

control  :  In 

signed  (3 

Downt  0 

0)  ; 

inl_00 ,  inl_ 

01 

,  inl.lO 

,  inl_ 

11  :  In 

signed  (high_bit 

Downto  0) 

} 

in2_00 ,  in2_ 

01 

,  in2_10 

,  in2_ 

11  :  In 

signed  (h 

igh_bit 

Downto  0) 

} 

A_00 ,  A_01, 

A_ 

10,  A_ll 

:  Out 

signed 

( high_bit 

Downto 

0)  ; 

B_00 ,  B_01, 

B_ 

10,  B_ll 

:  Out 

signed 

( high_bit 

Downt  0 

0)  ; 

o 

1 

o 

o 

o 

1 

o 

C_ 

10,  C_ll 

:  Out 

signed 

( high_bit 

Downt  0 

0)  ; 

o 

1 

Q 

o 

o 

1 

Q 

D_ 

10,  D_ll 

:  Out 

signed 

( high_bit 

Downto 

0))  ; 

End  entity  mem; 

1 

2 

3 

4 

5 


Listing  III.  10: 

--K  *  H  *  P( : , : ,k) 

Signal  reg5_00  :  signed (high_bit  Downto  0) 

00000000000000000000000000000000 " ; 
Signal  reg5_01  :  signed (high_bit  Downto  0) 

00000000000000000000000000000000 " ; 
Signal  reg5_10  :  signed (high_bit  Downto  0) 

00000000000000000000000000000000 " ; 
Signal  reg5_ll  :  signed (high_bit  Downto  0) 
00000000000000000000000000000000"  ;  .  .  . 
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Listing  III.  11: 


1 

Elsif  (elk  ^  event  And  elk  =  MO  Then 

2 

ease  eontrol  is 

3 

4 

when  "0000"  =>  --1 

5 

> 

1 

0 

0 

A 

II 

H- 

1 

0 

0 

6 

A_01  <=  phi.Ol; 

7 

A_10  <=  phi.lO; 

8 

A_ll  <=  phi.ll; 

9 

10 

B_00  <=  reg0_00  ; 

11 

B_01  <=  regO.Ol; 

12 

B_10  <=  regO.lO; 

13 

B_ll  <=  regO.ll  ;  .  .  . 

3.6.4  Arithmetic  Logic  Unit.  The  Arithmetic  Logic  Unit  (ALU)  is  the  por¬ 
tion  of  the  design  that  performs  the  actual  calculations.  The  overall  design  consists  of 
two  ALUs  working  in  parallel.  Each  ALU  is  capable  of  performing  one  of  three  tasks: 
matrix  addition,  matrix  subtraction,  and  matrix  multiplication.  The  matrix  addition 
and  matrix  subtraction  are  performed  by  adding  or  subtracting  corresponding  ele¬ 
ments  of  the  matrices.  For  two  matrices  of  size  m  x  n  their  addition  or  subtraction 
produces  an  mxn  matrix  result.  The  matrix  multiplication  is  accomplished  as  shown 
in  Figure  3.7  on  page  40.  Also  see  Figure  3.8  on  page  40  for  the  matrix  multiplication 
process. 

3.6.5  Newton- Raphson  Reciprocal.  The  behavioral  model  of  the  Kalman 
filter  in  VHDL  is  not  required  to  synthesize;  this  allows  for  a  simpler  implementation. 
As  mentioned  in  Section  3.5.4  on  page  25  it  is  only  necessary  to  carefully  bit  shift 
in  order  to  achieve  a  correct  outcome  when  performing  the  reciprocal.  For  synthesis 
purposes  an  RTL  description  is  required  to  not  only  allow  for  optimizations  but  to  also 


39 


Multi 


Mult2 


Mult3 


Mult4 


MultS 


Multe 


Mult? 


MultS 


A(00)  '  '  A(01)  '  "  A(10)  "  '  A(ll)  "I 


B(00)  B(01)  B(10)  B(ll) 


"  C(10)  ' 


C(ll) 


C(01) 


C(00) 


Figure  3.7:  Graphical  representation  of  matrix  multiplication. 


^00 

^01 

X 

Bqq 

Bqi 

■  Goo 

Goi  ■ 

^10 

^11 . 

Bio 

Bn 

.  Gio 

Cii 

C'oo  =  ^00  X  -^00  +  ^01  Biq 

Goi  =  Aqq  X  Bq\  +  Aqi  X  Bii 

Cio  =  Aiq  X  Bqq  +  All  X  Biq 

Cii  =  Aio  X  Bqi  +  All  X  Bii 

Figure  3.8:  2x2  matrix  multiplication. 

implement  a  synthesizable  reciprocation  function.  The  chosen  method  for  reciproca¬ 
tion  is  called  Newton-Raphson  Division.  The  Newton-Raphson  division  method  lends 
itself  well  to  the  special  case  of  finding  the  reciprocal.  There  are  several  ways  to  per¬ 
form  division  in  digital  designs.  The  methods  can  be  classified  as  either  a  fast  division 
algorithm  or  a  slow  division  algorithm.  The  slow  division  methods  produce  one  digit 
of  the  quotient  per  iteration  while  the  fast  division  methods  start  with  an  estimate 
and  arrive  at  a  quotient  through  multiple  iterations  of  an  algorithm;  doubling  the 
number  of  correct  bits  each  time  through.  In  order  to  achieve  maximum  speed  in  the 
overall  design,  the  slow  methods  were  not  considered.  The  two  methods  considered 
for  fast  division  are  the  Newton-Raphson  method  and  the  Goldschmidt  method.  Both 
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methods  are  iterative  and  require  initial  approximations.  The  Goldschmidt  method 
is  a  variation  of  Newton-Raphson  that  lends  itself  well  to  pipelining  [4],  Because 
pipelining  of  the  divider  is  unnecessary  due  to  data  hazards,  it  was  decided  to  use 
the  Newton-Raphson  method  which  directly  computes  the  reciprocal.  Although  the 
method  can  be  used  to  find  the  quotient,  the  process  first  finds  the  reciprocal  of  the 
divisor  and  then  multiplies  the  reciprocal  by  the  dividend  to  produce  the  quotient. 
This  method  converges  to  the  reciprocal  quadratically  [11].  For  the  special  case  of: 


H  X  P{:,:,k)  x  H' +  R  ^  ' 

the  Newton-Raphson  method  is  used  to  calculate  the  reciprocal  which  precludes  the 
final  step  of  multiplying  the  reciprocal  by  the  dividend. 


Figure  3.9:  Top  level  schematic  of  the  Newton-Raphson  reciprocal  VHDL  model. 


41 


3.6.6  Newton-Raphson  Division  Algorithm.  The  Newton-Raphson  iteration 
is  used  to  approximate  the  root  of  a  non-linear  function.  For  a  well  behaved  function 
f{x)  let  r  be  a  root  of  f{x)  =  0.  Now  let  xq  be  an  estimate  of  r  where  r  =  xq  +  /i;  h  is 
the  difference  between  the  estimate  and  truth.  Assuming  the  estimate  is  sufficiently 
accurate,  it  can  be  concluded  that  by  linear  approximation: 


0  =  f{r)  =  f{xo  +  h)^  f{xo)  +  h'f{xo) 

(310) 

And  therefore, 

h  ~ 

~  f'ixo) 

(311) 

It  follows  that 

,  U  /(^o) 

Xo  +  h  PS  Xo  . 

f  [xo) 

(312) 

and  therefore 

f{xo) 

(313) 

r  then  becomes  a  new  and  improved  estimate  Xi.  In  general  this  can 

be  stated  as: 

+ 

II 

1 

O 

(314) 

For  the  case  of  the  reciprocal 

/(^)  =  j 

(316) 

/'(*)  = 

(316) 

Substituting  equations  3.15  and  3.16  into  equation  3.14  yields 

A-D 

Xi 

Xj+i  =Xi  *  ^ 

Xj+i  =  2xj  —  x^D 

(317) 
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Here  is  a  simple  example  of  this  algorithm.  For  a  relatively  bad  estimate  of  0.3  can 
be  made.  This  means  that  D  =  2  and  Xj  starts  at  0.3. 

2(0.3)  -  (0.3)2(2)  =  0.42 

2(0.42)  -  (0.42)2(2)  =  0.4872 
2(0.4872)  -  (0.4872)2(2)  =  0.49967232 
2(0.49967232)  -  (0.49967232)2(2)  =  0.49999978 

As  can  be  seen,  in  only  four  iterations  the  Newton-Raphson  approximation  to 
the  reciprocal  problem  of  ^  rapidly  approaches  the  correct  answer.  Even  with  the  bad 
hrst  estimate  the  algorithm  converges  quadratically  requiring  relatively  few  iterations. 
The  maximum  relative  error  for  any  k-bits-in  m-bits-out  ROM  reciprocal  table  is  the 
result  of  the  relative  errors  obtained  between  the  actual  reciprocal  ^  and  the  lookup 
table  value  of  x  for  1  <  a:  <  2.  A  table  precision  of  a  bits,  i.e.  a  entries  will  always 
yield  a  maximum  error  of  at  most  A  [i]  As  a  worst  case  example,  if  16  bits  of 
accuracy  is  required  in  an  initial  estimate  and  only  one  bit  is  available  for  lookup 
purposes  then  the  worst  case  error  will  be  ^  =  0.5.  Using  this  worst  case  scenario  in 
the  example  above  yields  an  initial  estimate  of  1.0.  Therefore,  D  =  2  and  Xi  starts  at 
1.0. 

2(1.0)  -  (1.0)2(2)  =  0 
2(0)  -  (0)2(2)  =  0 

As  can  be  seen,  the  equations  do  not  converge.  At  least  two  bits  are  required  when 
calculating  the  initial  estimate  in  order  to  get  convergence.  As  another  example,  if 
two  bits  are  available  to  calculate  or  lookup  the  initial  estimate  then  the  worst  case 
error  will  be  ^  =  0.25.  Once  again,  using  the  example  from  above;  D  =  2  and  x, 
starts  at  1. 

2(0.75)  -  (0.75)2(2)  =  0.375 
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2(0.375)  -  (0.375)2(2)  =  0.46875 

2(0.46875)  -  (0.46875)2(2)  =  0.498046875 

2(0.498046875)  -  (0.498046875)2(2)  =  0.499992370606 

2(0.499992370606)  -  (0.499992370606)2(2)  =  0.499999999884 

As  can  be  seen,  in  order  to  achieve  at  least  the  same  level  of  precision  as  seen  in  the 
hrst  example,  one  more  iteration  is  required.  Note,  with  the  extra  iteration,  the  level 
of  precision  of  the  last  example  exceeds  that  of  the  first. 

3. 6. 1  Initial  Estimate.  The  number  of  iterations  required  to  converge  to  an 
acceptable  answer  using  the  Newton-Raphson  reciprocal  algorithm  described  above 
depends  on  the  accuracy  of  the  approximation  [5].  Two  algorithms  were  tested  for 
calculating  the  initial  estimates.  The  first  algorithms  tested  is  Equation  3.18. 

(3-18) 

Where,  D'  =  [l.did2...dM]  and  (M  +  1)  is  the  accuracy  in  bits  of  the  initial  approxi¬ 
mation  [5].  Despite  providing  accurate  estimates  for  the  Newton-Raphson  reciprocal 
function  the  second  algorithm  tested  provided  even  greater  accuracy.  The  algorithm 
chosen  for  generating  estimates  for  the  Newton-Raphson  reciprocal  can  bee  seen  in 
Equation  3.19. 

^stored  =  TTWWTTTT  (3.19) 

(■‘+2^-1 

2 

Where  C  =  [did2...dM\]  i.e.  D'  is  the  address  into  the  memory  and  the  first  eight  bits 
after  the  radix  of  the  normalized  number  for  which  the  reciprocal  is  being  calculated. 
The  number  of  bits  contained  in  C  is  called  a.  Eight  bits  were  chosen  for  C  to  keep 
the  size  of  the  ROM  as  small  as  possible  while  maintaining  the  desired  accuracy.  The 
eight  bit  address  into  the  ROM  corresponds  to  a  ROM  with  256  entries. 
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3. 6. 8  Newton-Raphson  Hardware  Implementation.  The  top-level  view  of  the 
hardware  implementation  of  the  Newton-Raphson  reciprocal  can  be  seen  in  Figure  3.9 
on  page  41.  The  Newton-Raphson  reciprocal  only  calculates  the  reciprocal  for  a  single 
number  and  not  an  entire  matrix.  As  can  be  seen,  it  is  broken  up  into  three  main 
stages: 

1.  Preliminary  sign  checking  and  normalization. 

2.  Approximation  lookup  in  ROM. 

3.  Iterative  calculations. 

The  following  sub-sections  contain  explanations  of  the  three  stages  mentioned  above. 

3.6.8. 1  Preliminary  Sign  Checking  and  Normalization.  The  first  step, 
at  the  input  point,  is  to  check  to  see  if  the  number  x  for  which  you  want  to  know 
the  reciprocal  ^  is  positive  or  negative.  If  the  number  is  negative  then  a  flag  is  set  to 
later  indicate  that  the  number  is  negative.  If  the  number  was  negative  then  it  is  made 
positive  by  taking  the  two’s  complement.  Finally  the  number  must  be  normalized  to 
produce  a  number  x  where  1  <  a:  <  2  which  is  required  for  approximation  lookup  in 
the  lookup  table.  Normalization  is  done  by  shifting  the  number  either  left  or  right. 
For  example,  the  8-bit  number  OIOO.IOOO2  is  normalized  by  right  shifting  by  two  bits 
thus  producing  OOOI.OOIO2.  A  direction  bit  for  de- normalizing  is  set  as  well  as  the 
shift  quantity  value.  The  direction  bit  indicates  the  direction  of  the  original  shift  and 
the  shift  quantity  designates  how  many  bits  to  shift. 

3. 6. 8. 2  Getting  the  Estimate:  Lookup-Table.  The  normalized  value  is 
then  passed  on  to  the  ROM  as  well  as  the  Multiplier.  The  ROM  uses  the  first  eight 
bits  from  the  right  side  of  the  radix  of  the  normalized  value  as  an  address  into  the 
ROM.  The  approximation  to  the  reciprocal  is  passed  through  a  multiplexor  and  into 
a  register.  It  is  at  this  point  that  the  iterative  portion  of  the  algorithm  begins. 
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3. 6. 8. 3  Iterative  Portion.  The  iterative  portion  performs: 


Xj_|_i  =  2xi  —  x^D 

Where  D  is  the  denominator  and  Xi  is  the  current  estimate  for  the  reciprocal.  Three 
iterations  of  the  above  equation  are  performed  producing  the  reciprocal  in  its  nor¬ 
malized  form.  At  this  point  it  is  necessary  to  de-normalize  the  number  by  shifting 
the  appropriate  direction  by  the  appropriate  number  of  bits.  If  the  sign  bit,  set  in 
the  preliminary  sign  checking  and  normalization  portion,  indicates  the  number  was 
negative  a  two’s  complement  conversion  is  performed  and  the  final  output  is  ready. 

3.7  Two  Dimensional  Implementation 

The  Kalman  filter  design  discussed  previously  is  a  one-dimensional  Kalman 
filter.  That  is,  linear  position  is  input  into  the  filter  and  an  estimate  of  the  linear 
position  and  linear  velocity  is  output.  The  number  representing  the  velocity  has  a 
magnitude  represented  by  the  absolute  value  of  the  number  and  a  direction  indicated 
by  the  sign  of  the  number. 

For  target  tracking  purposes,  a  one  dimensional  Kalman  filter  would  be  of  lim¬ 
ited  utility.  However,  combining  two  filters  together  and  appropriately  combining 
the  velocities  produces  an  efficient  real-world  position  and  velocity  estimator  of  much 
greater  utility. 

3. 1. 1  Combination  of  Two  Linear  Filters.  A  two-dimensional  Kalman  filter 
is  created  by  combining  two  one-dimensional  filters  together.  Each  filter  takes  as 
input  either  the  x  coordinate  or  the  y  coordinate  from  a  cartesian  plane.  Each  of  the 
Kalman  filters  output  a  separate  position  and  velocity.  The  position  value  represents 
where  along  the  associated  axis  the  target  is  located.  The  velocities  must  be  combined 
using  Equation  3.20. 

velocity  =  ^vl  +  v^  (3.20) 
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Equation  3.20  shows  that  the  combined  velocities  of  the  one-dimensional  Kalman 
hlters  is  equal  to  the  square-root  of  the  sum  of  the  squares;  Pythagorean’s  theorem. 

3.7. 1.1  Hardware  Implementation.  Implementing  equation  3.20  re¬ 
quires  three  main  steps: 

1.  Squaring  of  the  one-dimensional  velocities. 

2.  Addition  of  the  squares. 

3.  Taking  the  square  root. 

In  VHDL,  multiplication  is  considered  a  basic  operator  and  is  part  of  the  VHDL 
synthesis  package  numeric_std.  Addition,  likewise,  is  also  include  in  numeric_std. 
Neither  of  these  operators  require  special  programming  in  order  to  be  synthesizable. 
However,  the  square-root  operator  is  not  part  of  the  numeric_std  synthesis  package 
and  therefor  presents  a  designer  with  the  difficult  task  of  implementation. 


Starting  Approximatinn  Precisian  (in  bits) 

Figure  3.10:  Number  of  iterations  versus  starting  approximation. 
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Considering  the  requirement  of  this  thesis  to  produce  fast,  small  circuits,  a 
square-root  method  using  the  Newton-Raphson  method  similar  to  that  used  to  find 
the  reciprocal  is  implemented.  The  iterative  equation  for  finding  the  reciprocal  of  the 
square-root  is  [8]: 


xA^  —  Dx]) 
^i+i  = - ^ - 


(3.21) 


Where  D  is  the  number  for  which  it  is  desired  to  find  the  square  root;  i.e.  1  / a/D- 
As  mentioned  above.  Equation  3.21,  finds  the  reciprocal  of  the  square-root.  In  order 
to  find  the  square-root  from  its  reciprocal  it  is  required  to  multiply  by  D. 


-^D  =  ^/D 
^/D 


As  with  finding  the  reciprocal,  the  initial  estimate  will,  in  part,  determine  the 
number  of  iterations  of  equation  3.21  that  must  be  performed  to  achieve  a  desired 
accuracy.  Figure  3.10  on  page  47  shows  approximately  how  many  iterations  of  Equa¬ 
tion  3.21  are  required  to  achieve  53  bits  of  accuracy  [16].  In  the  case  of  the  two 
dimensional  Kalman  filter,  excess  clock  cycles  that  arise  from  calculation  of  the  linear 
filters  allows  for  five  iterations  with  clock  cycles  to  spare. 

Normalization  of  the  square-root  operand  is  required  in  order  to  both  initially 
populate  lookup  tables  as  well  as  for  accessing  estimates  from  those  tables.  Two  types 
of  normalization  are  required.  If  a  =  log2A  and  A  is  the  value  of  the  highest  order 
bit  then  the  first  type  of  normalization  is  for  even  numbered  a  and  the  second  type 
is  for  odd  numbered  a.  For  example,  the  binary  number  IOIO.IIOO2  has  a  highest 
ordered  bit  that  has  a  value  of  2^  =  8;  because  a  is  equal  to  an  odd  number,  (3), 
to  normalize  the  binary  number  it  must  be  right-shifted  a  +  1  times  to  produce  a 
number  of  the  form  QAxxxxxxx2-  Binary  numbers  greater  than  or  equal  to  one  and 
with  an  even  numbered  a  must  be  right-shifted  a  times  to  produce  a  number  of 
the  form  l.xxxxxxxx2.  Binary  numbers  greater  than  or  equal  to  one  and  with  an 
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odd  numbered  a  must  be  right-shifted  a  +  1  times  to  produce  a  number  of  the  form 
O.IXXXXXXX2.  Binary  numbers  less  than  or  equal  to  one  and  with  an  even  numbered 
a  must  be  left-shifted  a  times  to  produce  a  number  of  the  form  l.xxxxxxxx2.  Binary 
numbers  less  than  or  equal  to  one  and  with  an  odd  numbered  a  must  be  left-shifted 
a  —  1  times  to  produce  a  number  of  the  form  0.1xxxxxxx2. 

Equation  3.22  shows  the  method  for  calculating  an  estimate  for  a  normalized 
number  with  an  even  shift  value  where  D'  =  [I.did2...d8],  i.e.  the  hrst  eight  bits  of 
the  fraction  portion  of  the  normalized  number. 


(3.22) 


Equation  3.23  shows  the  method  for  calculating  an  estimate  for  a  normalized 
number  with  an  odd  shift  value  where  D'^  =  [0.1did2...ds]^  i.e.  the  first  eight  bits  after 
the  first  1  of  the  fraction  portion  of  the  normalized  number. 


(3.23) 


Listing  III.  12: 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Case  address  Is 

When  "00000000"  =>  -- 0 . 999024865687 1403 

est imate_intermediate  :=  "1111111111000000"; 

When  "00000001  "  =>  -- 0 . 997083 1 245596092 

est imat e_int ermediat e  :=  "1111111101000000"; 

When  "00000010"  =>  -- 0 . 995 1 5266 1 7 1 37967 

est imat e_int ermediat e  :=  "1111111011000010"; 
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11 


12 

13 


When  "00000011  "  =>  --0 . 9932333683907213 

estimate_intermediate  :=  "1111111001000100"; 


Listing  111.12  shows  four  approximation  entries  for  the  even  a  ROM.  The  ap¬ 
proximations  are  calculated  using  Equation  3.22  on  page  49. 

Listing  111.13: 


1 

Case  address  Is 

2 

3 

When  "00000000"  =>  -- 1 . 4 1 28345 142027 1 34 

4 

est imate_intermediate  :=  "0110100110101111"; 

5 

6 

When  "00000001  "  =>  -- 1 . 4 1 008847756554 1 6 

7 

est imate_intermediate  :=  "0110100011111011"; 

8 

9 

When  "00000010"  =>  -- 1 . 407358390827336 

10 

est imate_intermediate  :=  "0110100001001000"; 

11 

12 

When  "00000011  "  =>  -- 1 . 404644 1 00 1 796708 

13 

est imate_intermediate  :=  "0110011110010110"; 

Listing  111.12  shows  four  approximation  entries  for  the  odd  a  ROM.  The  ap¬ 
proximations  were  calculated  using  Equation  3.23  on  page  49. 

After  normalization,  and  estimate  lookup,  the  iterative  portion  of  the  process 
begins.  Five  iterations  of  Equation  3.21  on  page  48  are  performed  followed  by  mul¬ 
tiplication  of  the  reciprocal  square-root  by  the  square-root  operand.  It  is  this  step 
where  the  actual  square-root  is  calculated.  The  final  step  is  that  of  denormalization, 
where  the  answer  is  shifted  in  the  opposite  direction  as  in  normalization.  The  number 
of  shifts  for  denormalization  is  one-half  the  number  of  shifts  required  for  normaliza¬ 
tion.  It  is  now  that  the  combined  two-dimensional  velocity  has  been  fully  calculated 
and  is  output. 
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Figure  3.11:  Top  level  schematic  of  the  two-dimensional  implementation  or  combi¬ 
nation  of  linear  Kalman  filters. 

3.8  Design  Flexibility 

It  is  desired  to  make  the  design  flexible.  Flexibility  allows  a  designer  that 
wants  to  utilize  the  VHDL  Kalman  filter  to  designate  the  value  and  bit  width  of 
the  Kalman  filter  parameters.  A  Kalman  Alter  has  various  parameters  that  affect  its 
overall  behavior.  For  example,  process  noise  covariance  Q  and  measurement  noise 
covariance  R  can  be  tuned  according  a  system  model  , producing  the  desired  behavior 
of  the  Alter.  In  order  to  give  a  designer  this  kind  of  control  and  flexibility  while  still 
producing  synthesizable  code  an  alternate  programming  language  to  VHDL  is  needed. 
JAVA  was  chosen  for  its  ability  to  execute  on  most  multipurpose  computer  systems. 

It  is  assumed  that  the  reader  has  at  least  a  basic  understanding  of  programming 
and  JAVA.  The  JAVA  code  consists  of  four  packages: 

1.  Decimal  to  binary  converter:  DecimalToBinary.java 

2.  Code  generator:  CodeObject.java 

3.  Kalman  Alter  parameter  initializers:  Initializers.java 

4.  A  main  function:  KLmain.java 

3.8.1  Decimal  to  Binary  Converter.  The  decimal  to  binary  converter  al¬ 
lows  for  the  automatic  conversion  of  decimal  numbers.  Three  user  defined  integers, 
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Figure  3.12:  Top  level  schematic  of  the  primary  module  for  the  two-dimensional 

implementation  or  combination  of  linear  Kalman  filters. 

data_size,  fracton_size,  and  rom_estimate_size  determine  the  width  of  the  num¬ 
ber  in  binary  and  the  size  of  the  fraction  portion  of  the  number.  These  three  integers 
are  passed  to  the  method  when  an  object  of  type  DecimalToBinary  is  created.  The 
returned  value  is  a  string  of  ones  and  zeros  that  is  the  binary  representation  of  the  dec¬ 
imal  number  .  The  converter  uses  various  JAVA  methods  to  generate  a  binary  two’s 
complement  nnmbers  the  width  of  data_size  or  the  width  of  rom_estimate_size. 

The  integer  rom_estimate_size  is  nsed  to  indicate  the  size  of  the  data  inside 
the  ROM  lookup  table  that  is  used  in  the  Newton- Raphson  reciprocal  calculation.  It 
will  typically  be  significantly  smaller  than  data_size  as  it  is  only  an  estimate.  The 
smaller  size  of  the  estimates  also  minimizes  the  size  of  the  ROM.  For  the  two  designs 
tested  in  this  thesis,  ROM  estimates  of  size  8  bits  and  16  bits  are  nsed  for  the  32-bit 
and  64-bit  versions  respectively. 

3. 8. 2  Code  Generator.  The  code  generator  generates  all  of  the  Kalman  filter 
VHDL  code.  Inside  the  main  function,  Kf_main.jave,  an  object  of  type  CodeOb- 
ject  is  created  and  four  integers  are  passed  into  it:  rom_estimate_size,  data_size. 
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Listing  III.  14: 


1 

double  dt  =  0.1; 

2 

double  R  =  10.0; 

3 

double  Q  =  100.0; 

4 

double  []  G  =  {0.0,  1.0}; 

5 

double  []  B  =  {0.0 ,1.0>; 

6 

doublet]  Bd  =  {0.005,  0.1}; 

7 

double  []  H  =  {1.0,  0.0}; 

8 

doublet]  H_prime  =  {1.0,  0.0}; 

9 

doublet]  F  =  {0.0,  1.0,  0.0,  0.0}; 

10 

doublet]  phi  =  {1.0,  0.1,  0.0,  1.0}; 

11 

doublet]  phi_prime  =  {1.0,  0.0,  0.1,  1.0}; 

12 

doublet]  Qd  =  {0.0333,  0.5,  0.5,  10.0}; 

13 

doublet]  Gd  =  {1.0,  0.0,  0.0,  1.0}; 

14 

doublet]  X  =  {1.0,  1.0}; 

15 

doublet]  P  =  {0.25,  0.0,  0.0,  0.25}; 

fraction_size,  and  array _size.  The  new  object  of  type  CodeObject  can  then  be 
used  to  call  the  method  inside  CodeObject  .java  that  creates  the  VHDL  modules. 
Table  3.6  on  page  53  shows  the  methods  within  CodeObject: 


Table  3.6:  These  CodeObject  methods  each  generate  a  corresponding  VHDL  hie. 


kf_top(); 

sub_behavioral() ; 
KF_RTL_top(); 
mux_4_to_2(); 
mux(); 

reciprocaLstage3  ( ) ; 
NR LT ROM(); 


kf_top_tb(); 

ALUQ; 

KF_RTL_top_tb(); 
mux_2_to_l(); 
reciprocaLtopO; 
reciprocal_stage6() ; 


add_sub_behavioral() ; 

controller(); 

mem(); 

mult_behavioral() ; 
reciprocaLstagel  () ; 
register_ALU(); 


The  user  adjustable  Kalman  hlter  parameters  are  initialized  inside  CodeOb- 
ject.java.  See  Listing  111.14  for  a  list  of  these  parameters.  Many  of  these  parameters 
are  matrices  and  are  created  as  type  array  in  JAVA.  The  integer  array  .size  is  passed 
into  a  method  called  Array _size_initializer  where  it  is  used  to  designate  the  size  of 
the  arrays. 
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3.8.3  Initializers.  This  file  creates  the  objects  seen  in  Listing  111.14.  As 
mentioned  above,  the  user  settable  integer  constant  that  determines  the  size  of  the 
arrays  is  called  array _size.  This  constant  can  be  set  by  the  user  inside  main. 

3.8.4  Main.  The  method  mam  is  where  the  integer  constants:  rom_estimate_size, 
data_size,  fracton_size,  and  array _size  are  set.  Also,  it  is  inside  main  where  an 
object(s)  of  type  CodeObject  is  created.  That  CodeObject  is  then  used  to  call 

the  various  methods  inside  CodeObject  .java  which  then  writes  the  VHDL  Kalman 
filter  to  file. 
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IV.  Testing  and  Evaluation 


f  j  ~^his  chapter  discusses  the  testing  approach,  test  application,  and  test  results. 

4-1  Testing  Approach 


Testing  of  the  Kalman  hlter  VHDL  model  was  performed  to  verify  function  of 
the  model  in  comparison  with  output  data  produced  by  the  Matlab®  version  of  the 
hlter.  Identical  inputs  were  run  through  both  versions  of  the  hlter  and  the  outputs 
compared.  Input  vector  z{:,k)  consists  of  500  pseudo-random  noise-corrupted  mea¬ 
surements.  The  input  values  as  generated  in  Matlab  have  a  standard  deviation  of 
323.9967  indicating  a  wide  range  of  values.  The  values  range  from  a  maximum  of 
452.907548  to  a  minimum  of  —856.481355.  Two  versions  of  the  VHDL  Kalman  hlter 
were  tested:  a  32-bit  version  and  a  64-bit  version. 


4.1.1  The  Test  Beneh.  The  32-bit  and  64-bit  test  benches  were  created 
using  JAVA  to  populate  each  of  them  with  their  hxed  point  radix  binary  numbers. 
The  test  bench  consists  of  500  inputs  z  that  are  cycled  through  the  VHDL  Kalman 
hlter  to  produce  500  position  data  and  500  velocity  data.  A  simulation  was  run  in 
Mentor  Graphics  ModelSim®  SE  Plus  6.3c  revision  2007.09  for  both  the  32-bit  and 
64- bit  versions  of  the  VHDL  Kalman  hlter  with  their  respective  test  benches.  A  list 
of  the  output  was  created  from  the  wave  diagram  and  exported  to  a  hie.  The  binary 
outputs  were  then  converted  to  integers  using  a  JAVA  binary-to-decimal  converter 
written  specihcally  for  this  thesis.  The  JAVA  binary-to-decimal  converter  produced 
a  text  hie  containing  the  integer  version  of  the  ModelSim®  output  that  was  then 
imported  into  a  spreadsheet  for  analysis. 

4.1.2  Analysis.  Analysis  consisted  of  a  look  at  possible  error  sources,  fol¬ 
lowed  by  calculation  of  the  standard  deviation  for  the  difference  and  a  percentage- 
difference  between  the  output  produced  by  Matlab®  and  the  ModelSim®  simulation 
output  for  the  VHDL  Kalman  hlter. 
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4 .1.2.1  Error  Analysis.  Errors  can  arise  from  various  sources.  One 
way  that  error  can  be  introduced  is  in  normalization  of  numbers.  The  reciprocal 
square-root  function  required  normalization  of  the  input  in  order  to  use  the  even  and 
odd  ROM  lookup  tables.  The  worst  case  scenario  is  that  a  32-bit  binary  number  with 
a  16-bit  fraction  of  the  form 

0111111111111111. 111111111111111I2 

is  normalized  by  right  shifting  by  14  bits.  The  normalized  number  would  look  like: 

0000000000000001. 111111111111111I2 

which  means  the  right  14  bits  of  the  fraction  are  lost. 

=  +0.25  (4.1) 

Equation  4.1  shows  the  maximum  error  that  might  occur  due  to  normalization.  To 
avoid  this  error,  padding  of  the  numbers  prior  to  normalization  would  preserve  the 
accuracy.  The  required  bit-padding  would  be  the  size  of  the  fraction  portion  minus 
two  divided  by  two.  If  the  number  of  bits  to  be  padded  is  6_|_  and  the  number  of  bits 
in  the  fraction  is  x  then: 


The  divide  by  two  is  due  to  the  fact  that  when  performing  a  reciprocal  square-root, 
denormalization  requires  a  shift  that  is  half  the  value  of  the  normalization  shift  and 
in  the  opposite  direction. 

For  the  reciprocal  function  used  in  the  Kalman  filter,  both  normalization  and 
denormalization  bit  shifts  are  in  the  same  direction  and  of  the  same  magnitude.  This 
means  that  no  error  will  occur  due  to  the  normalization  process.  For  example,  if 
the  binary  number  0111111111111111. IIIIIIIIIIIIIIII2  is  normalized  it  takes  on 
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the  form  0000000000000001. IIIIIIIIIIIIIIII2.  In  this  example,  14  bits  were  shifted 
and  lost.  However,  because  denormalization  for  the  reciprocal  function  requires  a 
shift  in  the  same  direction  and  of  the  same  magnitude  as  normalization,  any  bit¬ 
padding  that  would  have  preserved  the  bits  is  lost  when  converting  the  number  back 
to  its  original  format  of  32  bits.  To  further  demonstrate  this,  consider  the  following 
example. 


1 

0111111111111111. 111111111111111I2 


O.OOOOOOOOOOOOOOIO2 


Now  the  reciprocal  for  the  normalized  number  is  found.  Normalization  required  a 
right  shift  by  14  bits. 


1 

0000000000000001. 111111111111111I2 


O.IOOOOOOOOOOOOOOO2 


Denormalizing  the  answer  from  the  above  equation  produces  the  number: 


O.OOOOOOOOOOOOOOIO2 


which  is  exactly  identical  to  the  answer  for  the  non- normalized  reciprocal.  There  is 
no  loss  of  data. 

J1..I.2.2  Difference  Comparison.  For  the  difference  comparison  the 
difference  was  taken  between  the  output  produced  by  Matlab®  and  the  ModelSim® 
simulation  output  for  the  VHDL  Kalman  filter  for  all  1000  outputs(500  position  out¬ 
puts,  and  500  velocity  outputs).  These  differences  for  position  are  plotted  for  both 
the  32-bit  and  64-bit  test  cases  and  can  be  seen  in  Figure  1(a)  on  page  58  and  Fig¬ 
ure  1(b)  on  page  58  respectively.  The  standard  deviation  was  then  calculated  for  the 
difference.  See  Figure  4.3  on  page  61  for  the  standard  deviation. 
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Figure  4.1:  Difference  taken  between  the  VHDL  Kalman  filter  output  with  32-bit  and  64- bit  fixed  point  representation: 
and  the  Matlab®  Kalman  filter  output.  There  are  500  differences  shown  for  each  figure. 
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4- 1.2. 3  Percentage  Difference  Comparison.  The  percentage  difference 
was  calculated  by  taking  one  minus  the  difference  of  the  output  produced  by  Mat  lab® 
and  the  ModelSim®  simulation  output  for  the  VHDL  Kalman  filter  for  all  1000  out- 
puts(500  position  outputs,  and  500  velocity  outputs).  The  standard  deviation  was 
then  calculated  for  the  percentage  difference.  See  Figure  4.3  on  page  61  for  the  stan¬ 
dard  deviation. 

4. 1.2. 4  Difference  Comparison  Standard  Deviation.  The  standard 
deviations  that  was  calculated  for  the  32-bit  and  64-bit  test  cases  can  be  seen  in 
Figure  4.3  on  page  61.  The  extremely  small  deviation  from  the  mean  indicates  that 
the  VHDL  Kalman  filter  output  is  a  very  good  approximation  to  the  Mat  lab®  im¬ 
plementation  of  the  Kalman  filter.  As  expected  the  standard  deviation  for  the  64-bit 
implementation  is  smaller  than  that  for  the  32-bit  implementation.  Figure  4.2  on 
page  59  shows  the  difference  comparison  for  both  tests.  As  can  be  seen,  the  64-bit 
implementation  varied  less  overall  with  the  Mat  lab®  Kalman  filter  implementation 
output. 

The  large  spikes  seen  in  the  first  iterations  of  Figure  1(a)  and  Figure  1(b)  on 
page  58  are  due  to  the  Kalman  filter  being  in  a  transient  state.  This  transient  state 
is  caused  by  the  initial  default  estimates  of  the  filter  not  being  accurate  estimates 
of  the  state  of  the  system.  These  initial  estimates  are  not  intended  or  expected  to 
be  accurate,  rather,  the  filter  must  iterate  multiple  times  to  reach  a  steady  state. 
Simulations  were  performed  in  Mat  lab®  in  which  a  single  position  measurement  was 
set  as  the  filter  input  and  held  for  100  iterations.  The  filter  achieved  steady-state 
on  the  55*^  iteration  when  the  velocity  went  to  zero  and  the  position,  as  output  by 
the  filter,  became  equal  to  the  input  position.  The  number  of  iterations  required 
to  achieve  steady-state  is  dependent  on  the  filter  parameters  and  therefore  can  be 
changed  to  fit  design  requirements. 

Simulations  were  run  for  the  two-dimensional  implementation  that  resulted  in 
a  standard  deviation  of  0.021272.  As  expected,  this  was  greater  than  the  standard 
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Difference 


32  bits 
64bits 


32  bits 
64bits 


Position  500  Sampies  (16  bit  ROM) 

Veiocity  500  Sampies  (16  bit  ROM) 

0.000269 

0.003725 

0.000190 

0.002407 

Percentage  Difference 

Position  500  Sampies  (16  bit  ROM) 

Veiocity  500  Sampies  (16  bit  ROM) 

0.006000% 

0.127800% 

0.006100% 

0.338500% 

Figure  4.3:  Standard  deviation  for  the  difference  and  percentage-difference  of  the 

VHDL  Kalman  filter  output  (with  a  64-bit  and  32-bit  fixed  point  representation)  and 
the  Mat  lab®  Kalman  filter  output. 


deviation  for  the  one-dimensional  implementation  due  to  error  as  discussed  in  sub¬ 
section  4. 1.2.1.  Figure  4.4  on  page  62  shows  the  simulation  results  as  a  difference 
between  the  two-dimensional  VHDL  Kalman  filter  and  the  two-dimensional  Mat  lab® 
Kalman  filter  as  calculated  using  Microsoft  Excel.  The  large  spikes  were  expected 
due  to  the  normalization  error.  Note  that  the  spikes  do  not  exceed  the  maximum 
calculated  error  of  0.25. 
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4.1.3  Speed  and  Area  Analysis.  Synthesis  testing  to  determine  maximum 
frequency  at  which  the  design  will  run  was  performed  on  a  Xilinx  Virtex-4  model 
4vsx35ff668  speed  grade  -10.  The  same  synthesis  test  determines  resource  usage 
on  the  FPGA.  The  synthesis  was  performed  using  Mentor  Graphics’  Precision  RTL 
Synthesis  2006a. 101.  Tests  were  run  for  both  32-bit  as  well  as  64-bit  versions  of  the 
VHDL  Kalman  filter.  Table  4.3  on  page  65  shows  the  results  for  maximum  frequency 
determination.  As  can  be  seen,  an  initial  frequency  was  chosen  for  the  32-bit  test  of 
lOOMHz.  This  test  resulted  in  negative  slack,  indicating  that  the  design  would  not 
run  at  lOOMHz.  A  frequency  recommended  by  the  synthesis  tool  of  40MHz  was  then 
run,  resulting  in  a  relatively  small  positive  slack  indicating  that  the  design  will  run 
at  40MHz.  Slack  values  specify  amounts  of  extra  propagation  delay  available  on  the 
critical  path.  A  small  positive  slack  indicates  that  the  design  will  just  barely  function 
at  the  specified  frequency.  As  with  the  32-bit  tests,  the  64-bit  VHDL  Kalman  filter 
was  first  tested  at  lOOMHz.  This  test  resulted  in  negative  slack,  indicating  that  the 
design  will  not  run  at  that  frequency.  The  recommended  frequency  for  the  second 
test  was  51MHz  which  resulted  in  a  relatively  small  positive  slack. 

The  results  for  area  can  be  seen  in  Figure  4.1.  The  32-bit  implementation  fit  on 
the  Virtex-4  utilizing  13.82%  of  the  Gonfigurable  Logic  Block  (GLB)  slices,  however, 
the  64-bit  implementation  required  142.78%  of  the  GLB  slices. 


Table  4.1:  Synthesis  test  of  the  one-dimensional  Kalman  filter  using  Precision  RTL 
Synthesis  with  the  Virtex-4  4vsx35ff668  at  speed  grade  -10. 


Test 

GLB  Slices 

lOs 

DSP48s 

32-bit  test  lOOMHz 

13.82% 

22.32% 

37.50% 

32-bit  test  40MHz 

13.82% 

22.32% 

37.50% 

64-bit  test  lOOMHz 

142.78% 

43.75% 

93.75% 

64-bit  test  51  MHz 

142.78% 

43.75% 

93.75% 

Further  testing  was  done  for  the  64-bit  model  due  to  the  area  requirements. 
Because  the  Virtex-4  is  too  small  to  accommodate  the  64-bit  model,  the  Xilinx  Virtex- 
5  model  5vsx95tffll36  speed  grade  -3  was  chosen  instead.  Table  4.4  on  page  65  shows 
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the  frequency  results  for  the  two  tests  performed  and  Table  4.2  on  page  64  shows  the 
results  for  resource  usage. 


Table  4.2:  Synthesis  test  of  the  one-dimensional  Kalman  filter  using  Precision  RTL 
Synthesis  with  the  Virtex-5  5vsx95tffll36  at  speed  grade  -3. 


Test 

CLB  Slices 

lOs 

DSP48s 

64-bit  test  lOOMHz 
64-bit  test  50MHz 

16.51% 

16.51% 

30.63% 

30.63% 

30.94% 

30.94% 

In  reference  to  Table  4.3  on  page  65,  for  the  case  of  the  32-bit  representation, 
40MHz  equates  to  a  clock  period  of  0.000000025  seconds  or  25  ns.  This  means  that 
at  49  clock-cycles  per  iteration  of  the  one-dimensional  VHDL  Kalman  filter,  a  total 
of  1,225  ns  per  iteration  are  required.  Also,  at  40MHz,  49  clock-cycles  per  iteration 
equates  to  816,  326  iterations  with  each  iteration  corresponding  to  a  potential  target 
being  tracked.  If  data  is  fed  into  the  one-dimensional  VHDL  Kalman  filter  at  a  rate 
of  30  times  per  second  for  each  target,  position  and  velocity  estimations  for  a  total  of 
approximately  27,  210  targets  can  be  performed. 
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64- bit  test  50MHz  Successful  50.00MHz  20  ns  N/A  6.317  Yes 


V.  Conclusions  and  Future  Work 


f  j  ~^his  chapter  contains  conclusions,  and  future  work. 

5.1  Conclusions 

As  warfare  evolves  so  to  must  the  technology  we  use  to  fight  wars.  This  thesis 
has  provided  a  tool  as  well  as  a  model  for  tool  development  that  will  allow  electronics 
designers  to  more  quickly  and  easily  implement  designs  utilizing  a  Kalman  filter.  This 
helps  to  enable  rapid  fielding  of  force  multiplying  technology  to  the  warfighter.  It  was 
demonstrated  that  a  near-optimal  VHDL  Kalman  filter  model  can  be  programmed 
onto  an  inexpensive  FPGA  for  potential  implementation  into  target  tracking  systems. 
Algorithms  provided  in  Matlab®  were  implemented  using  modern  FPGA  synthesis 
tools  to  characterize,  experiment,  and  generate  flexible  VHDL  codes.  Fundamental 
modular  VHDL  building  blocks  were  formulated  that  can  be  used  to  optimize  the 
target  tracking  algorithms  while  considering  speed,  power,  and  area  for  the  targeted 
applications.  The  code  flexibility  means  that  parameters  are  adjustable  allowing  for 
experimentation  of  various  combinations  as  to  optimize  or  tune  the  algorithm  for 
different  applications. 

5.2  Future  Work 

Future  work  for  expansion  of  this  thesis  may  include: 

•  Investigate  the  implementation  of  various  multipliers  for  use  in  the  VHDL 
Kalman  filter. 

•  Error  reduction  through  bit-padding  in  the  Newton-Raphson  square-root  func¬ 
tion. 

•  Data  storage  and  swapping  for  multiple-target  tracking. 

Currently  the  VHDL  Kalman  filter  implementation  relies  on  the  default  process  for 
multiplication  as  defined  by  the  VHDL  compiler.  An  investigation  into  various  RTL 
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multiplication  implementations  should  be  done  and  a  comparison  made  with  the  de¬ 
fault  implementations  of  the  available  VHDL  compilers. 

Currently  there  exists  a  maximum  error  of  0.25  that  can  occur  due  to  the  bit- 
shifting  required  for  the  normalization  and  denormalization  process.  It  is  proposed 
that  future  work  be  done  to  include  bit-padding  in  order  to  preserve  accuracy  during 
this  process. 

The  current  implementation  does  not  address  the  storage  and  swapping  of  data 
into  and  out  of  the  VHDL  Kalman  filter.  I  propose  an  onboard  memory  with  approx¬ 
imately  120,000  (6  X  20,000  targets),  32  or  64  (480,000  or  960,000  bytes)  bit  positions 
used  to  store  x(:,k)  and  P(:,:,k).  The  address  to  access  a  targets  information  comes 
from  outside  the  system  in  the  form  of  a  target  number.  The  target  acquisition  pro¬ 
gram  feeds  both  the  target  number(address)  as  well  as  the  input  z(:,k).  It  may  also  be 
necessary  for  the  target  locating  program  to  send  a  reset  signal  along  with  a  particular 
address  that  will  reset  the  memory  for  the  case  where  a  target  number  is  reassigned  to 
another  target  or  becomes  inactive.  In  the  case  of  a  reassignment  or  a  target  number 
becoming  inactive  the  x  and  P  values  must  be  reset.  When  the  target  number  changes 
it  is  used  as  the  address  to  access  the  last  x  and  P  value  for  that  target.  Those  x 
and  P  values  are  loaded  into  the  secondary  memory  or  if  the  secondary  memory  is 
reduced  to  holding  only  constants  then  the  x  and  P  values  can  be  used  directly  and 
the  x(:,k-|-)  and  P(:,:,k-|-)  will  be  saved  into  the  primary  memory.  This  scheme  or  a 
similar  scheme  would  allow  a  single  linear  Kalman  filter  to  track 
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Appendix  A,  Matlab  Code 


f  j  ~^his  appendix  contains  the  Matlab®  code  written  by  Dr.  Juan  Vasquez. 


Listing  A.l:  This  is  the  main  file  that  implements  the  Kalman  Filter  by  calling 

both  eqniv_discrete.m  as  well  as  KF.m  . 

(appendix2  /  main.m) 


“/o  Main  Routine  -  this  is  my  code  for 

clear 

clc 

r andn ( ^  seed  ^  ,0)  ; 
cleanup=l ; 
gent ruth= 1 ; 
nruns  =  1 ; 
run=l ; 
t _f inal =50 ; 
dt=. 1; 

t  =0 : dt : t  _f inal ; 
t_last =length (t ) ; 


“/o  l  =  generate  truth  data, 
“/o  #  of  monte  carlo  runs 
%  single  run  to  plot 
%  Final  time 
“/o  Sampling  interval 
“/o  time  vector 


generating  truth 


0=do  not 


7o  Model  parameters 

%  Truth  model  -  constant  components 
R_t  =  10;  %  Measurement  noise  covariance 

Q_t  =  100;  %  Dynamics  noise  strength 

G_t  =  [0;  1];  7o  Noise  injection  model 

H_t  =  [1  0]  ;  “/o  Measurement  model 

B_t  =  [0;  1]  ; 

F_t  =  [0  1;  0  0]  ; 

[phi_t  ,  Bd_t  ,  Qd_t  ]  =  equi  v_di s  cr  et  e  ( F_t  ,  B_t  ,  G_t  ,  Q_t  ,  dt  )  ; 
Gd_t  =  eye  (2)  ; 


7o  True  Initial  conditions 
7o  State=[x  v]  =  [position  velocity] 
x_t  (  :  ,  1)  =  [1  1]  ; 

z(l  ,l)=x_t (1  ,1)  ; 


7o  Filter  Model  parameters 

R  =  10;  7o  Measurement  noise  covariance 
Q  =  100;  7o  Dynamics  noise  strength 
G  =  [0;  1];  7o  Noise  injection  model 

H  =  [10];  7o  Measurement  model 

B  =  [0;  1]  ; 

F  =  [0  1;  0  0]  ; 

[phi,  Bd  , Qd] = equi v_di s cr et e (F , B , G , Q , dt ) ; 

Gd=  eye  (2)  ; 

7o  Noise  injection  matrices  for  truth  model  simulation 
dynNoise_t  =  sqrt(Qd_t)^; 
measNoise_t  =  sqrt(R_t)^; 

7o  Begin  Monte  Carlo  runs 
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if  gentruth==l 

for  i = 1 : nruns 

“/o  Generate  truth  data 
f or  k  =  2 : t _last 

“/o  Generate  noise  -  corrupt ed  states 
noise  =  dynNoise_t *randn  (2 , 1)  ; 
x_t(:,k)=  phi_t *x_t ( : , k-1)  +  noise  ; 

“/o  Generate  noise  -  corrupt  ed  measurement 
z(:,k)  =  H_t*x_t(:,k)  +  measNoise_t *randn ; 

end 

“/o  Save  truth  data 

eval([^save  run ^  num2str(i)  ^  x_t  z  t_final  dt ^ ] ) 

end 

end 

for  i=l : nruns 
clear  x 

%  Filter  Initial  conditions 

“/o  State=[x  v]  =  [position  velocity] 

X  (  :  , 1)  =  [1  1]  ; 

P= . 25*eye  (2)  ; 

KF  “/o  Kalman  Filter  subroutine 

end 

for  i=l : nruns 

%  Generate  error  statistics 
eval ( [ ^ load  run ^  num2str(i)  ^f^  ]) 

eval ( [ ^ load  run ^  num2str(i)  ]) 
for  j=l:2  %  2  states 

e(j,:,i)=  x_t(j,:)  -  x(j,:); 

end 

res(: ,i)=residual; 

end 

“/o  Ensemble  stats 

Me=mean ( e , 3) ;  %  mean  error  over  number  of  runs  (dimension  3) 

sigma_e  =  std  (e  ,  0 , 3)  ;  “/o  standard  dev  of  error  over  number  of  runs  (... 

dimension  3) 

“/o  Since  the  noise  stats  are  stationary,  all  of  the  gains  and  ... 
filter 

“/o  computed  covariance  values  are  the  same  for  each  run,  so  just  ... 
use  the 

%  last  run  data  that  was  just  loaded  above 

%  Temporal  stats  of  ensemble  data 

Me_t ime  =  mean (Me  ,2) 

sigma_e_t ime  =  mean ( sigma_e  ,2)  ; 

f igure  (1) 
for  j  = 1 : 2 

subplot  (2,l,j)  ,plot(t,Me(j  ,  :)  ,  ^  r  ^  ,  .  .  . 
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t , Me ( j  ,  :  ) +  3*  sigma. e  ( j  ,  :  )  ,  '  b  '  ,  .  .  . 
t , Me ( j  ,  : ) -  3*  sigma. e ( j  ,  : )  ,  '  b  '  ,  .  .  . 
t ,3*sigma.f (j  ,  :)  ,  'm:  '  ,  .  .  . 
t , -3*  sigma. f  (  j  ,  : )  ,  ' m :  O 

end 

figure  (1)  , subplot  (211) 

titleC^State  Estimates  and  Covariances  [x  v  ]  O 

Yo  Load  a  specific  single  run  data  file  and  plot 
eval ( [ ^ load  run ^  num2str(run)  ^f^  ]) 

eval (  [  ^ load  run ^  num2str(run)  ]) 

figure  (2)  %  last  run  data  only 

subplot  (211)  ,plot(t,x(l  ,  :)  ,  ^rO  ;hold  on; 
plot  (t  ,  x.t  ( 1  ,:));  hold  off; 
title(  ^PositionO 

subplot  (212)  ,plot(t,x(2,  :)  ,  ’ r ’ )  ;hold  on; 
plot (t , x.t  (2  ,  : ) )  ; hold  off; 
title ( ^Velocity  ^ ) 

Yo  Residual  data 

Mr  =  mean  ( res  ,  3)  ;  Yo  mean  residual 
s igma.r  =  std  ( res  ,  0 , 3)  ;  Yo  std  of  residual 

s  igma.r  .f  ( 1  ,  :  )  =  sqrt  ( A  ( 1  ,  :  )  )  ;  Yo  filter  computed  std  of  residual 

f igure  (3) 

plot (t , Mr  ( 1  ,  : )  ,  ' r '  ,  .  .  . 

t,-3*sigma.r(l,  :)  ,  ^b^  ,  .  .  . 
t,3*sigma.r(l,  :)  ,  ^b^  ,  .  .  . 
t,-3*sigma.r.f  ,  ^m^  ,  .  .  . 
t,3*sigma.r.f  ,  ^mO  ,title(  ^Residual  O 

Yo  AANES  Calculations 
for  i  =  1 : nruns 
for  j  =  1 : t .last 

ness(j,i)  =  (e(:,j,i)  '  *inv (P (:,:,j))*e(:,j,i))/2; 
end 
end 

aness  =  sum (ness , 2) /nruns ; 
f igure (4) 

plot  (aness)  ,title(^ANESSO  ; 

Yo  File  Clean-Up 
if  cleanup==l 

for  i=l : nruns 

eval ( [ ^ delet e  run ^  num2str(i)  ^ f . mat ^  ]) 

eval ( [ ^ delet e  run ^  num2str(i)  ^ . mat ^ ] ) 

end 

end 


Listing  A. 2:  This  the  Kalman  Filter  algorithm. (appendix2/KF.m) 


Yo  Load  truth  data  and  other  parameters 
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eval ( [ ^ load  run ^  num2str(i)  ]) 
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for  k=2 : t _last 

x( :  ,k)=  phi*x(  :  ,k-l)  ; 

P(:,:,k)=  phi *P ( : , : , k-1) *phi ^  +  Qd; 

A(  :  ,k)=  H*P( :  ,  :  ,k) +  R; 

K=P(: , : ,k)*H^*(inv(A(: ,k))) ; 
r esidual  (  :  , k) =  z(:,k)  -  H*x(:,k); 

x(:,k)=  x(:,k)  +  K*residual ( : , k) ; 

P(: , : ,k)=  P(: , : ,k)  -  K*H*P(: , : ,k) ; 

sigma_f ( : , k) =sqrt (diag (P ( : , : , k) ) ) ; 

KK(: ,k)=K; 
end  %  End  time  loop 

“/o  Save  results 

eval ( [ ^ save  run ^  num2str(i)  x  P  sigma_f 


A  residual ^ ] ) 
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Appendix  B.  Behavioral  Kalman  Filter  in  VHDL 


This  appendix  contains  the  VHDL  code  that  describes  the  Kalman  Filter  behav- 
iorally. 


Listing  B.l: 


--Entity  :  kf_top 

--Description  :  This  is  a  behavioral  description  of  the  Kalman 
--Filter  as  described  by  Dr.  Juan  Vasquez  in  Matlab. 

--Inputs  : 

--NAME  TYPE  DESC 

--z_position  Signed  input 
--reset  Std_logic  reset 

--Outputs  : 

--NAME  TYPE  DESC 

--position  Signed  position  value 
--velocity  Signed  velocity  value 


Library  IEEE; 

Use  IEEE . std_logic_1164 . All ; 

Use  IEEE . numeric_std . All ; 

Package  matrix_types  Is 

Type  matrix_2x2  Is  Array  (1  to  2,  1  to  2)  Of 

Signed  (31  Downto  0)  ; --row  x  column 

Type  matrix_2xl  Is  Array  (1  to  2)  Of 
Signed  (31  Downto  0)  ; --row  x  column 

Type  matrix_lx2  Is  Array  (1  to  2)  Of 
Signed  (31  Downto  0)  ; --row  x  column 
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End  Package  mat r ix_types ; 


Library  IEEE; 

Use  IEEE . std_logic_1164 . All ; 

Use  IEEE . numeric_std . All ; 

Use  work . matrix_types . All ; 

Use  std . text io . All ; 

Entity  kf_top  Is 

Port ( z_pos it  ion  :  In  Signed(31  Downto  0); 
reset  :  In  Std_logic ; 

position,  velocity  :  Out  Signed(31  Downto  0)); 

End  kf _t op ; 

Architecture  kf_behav  Of  kf_top  Is 

Function  to  multiply  two  2x2  matrices  * 

Function  mat r ix_mult _2x2  (A,B:  matrix_2x2) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Variable  func_templ  :  Signed(63  Downto  0)  := 

(OTHERS  =>  ’O’) ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  L  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

func_templ  :=  ( A ( i , j ) *B  ( j  , L) )  + 

func_templ ; 

End  Loop  ; 

result (i,L)  :=  func_templ (47  Downto  16); 

func_templ  := 
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(OTHERS  =>  ’O’) ; 


End  Loop  ; 

End  Loop  ; 

Return  result ; 

End  mat r ix_mult _2x2 ; 

Function  to  add  two  2x2  matrices  * 

Function  mat r ix_add_2x2  (A,B:  matrix_2x2) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

result(i,j)  :=  A ( i , j ) +B ( i , j ) ; 

End  Loop  ; 

End  Loop  ; 

Return  result ; 

End  mat r ix_add_2x2 ; 

Function  to  add  a  scalar  to  a  2x2  matrix  * 

Function  matr ix_add_int _2x2  (A:  matrix_2x2  ;B:  Signed(31 

0)) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

result(i,j)  :=  A(i,j)+B; 

End  Loop  ; 

End  Loop  ; 

Return  result ; 


Downt o 
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End  matr ix_add_ int _2x2 ; 


Function  to  multiply  a  2x2  with  a  2x1  matrix  * 

Function  mat r ix_mult _2x2_2x 1  (A:  matrix_2x2  ;B:  matrix_2xl) 

Return  matrix_2xl  Is 
Variable  result  :  matrix_2xl ; 

Variable  func_templ  :  Signed(63  Downto  0)  := 

(OTHERS  =>  ^0  O  ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

func_templ  :=  ( A ( i , j ) *B  (  j ) )  +  func_templ ; 

End  Loop  ; 

result (i)  :=  f unc_t emp 1  (47  Downto  16); 

func_templ  := 

(OTHERS  =>  ^0  O  ; 

End  Loop  ; 

Return  result ; 

End  mat r ix_mult _2x2_2x 1 ; 

Function  to  multiply  a  1x2  with  a  2x2  matrix  * 

Function  mat r ix_mult _ Ix2_2x2  (A:  matrix_lx2  ;B:  matrix_2x2) 

Return  matrix_lx2  Is 
Variable  result  :  matrix_lx2  ; 

Variable  func_templ  :  Signed(63  Downto  0)  := 

(OTHERS  =>  ’O’) ; 

Begin--Begin  function  code . 

For  L  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

func_templ  :=  ( A ( j ) *B  (  j  ,  L) )  +  func_templ ; 

End  Loop  ; 
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result (L)  :=  f unc_t emp 1  (47  Downto  16); 

func_templ  := 

(OTHERS  =>  ^0  O  ; 

End  Loop  ; 

Return  result ; 

End  mat r ix_mult _ Ix2_2x2 ; 

Function  to  multiply  a  1x2  with  a  2x1  matrix  * 

Function  mat r ix_mult _ lx2_2x 1  (A:  matrix_lx2  ;B:  matrix_2xl) 

Return  Signed  Is 

Variable  result  :  Signed(63  Downto  0)  := 

(OTHERS  =>  ^0  O  ; 

Begin--Begin  function  code . 

For  j  In  1  to  2  Loop 

result  :=  (A(j)*B(j))  +  result; 

End  Loop  ; 

Return  result  (47  Downto  16)  ; 

End  mat r ix_mult _ lx2_2x 1 ; 

Function  to  multiply  a  2x1  and  a  1x2  matrix  * 

Function  mat r ix_mult _2x 1 _ 1x2  (A:  matrix_2xl  ;B:  matrix_lx2  ) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Variable  func_templ  :  Signed  (63  Downto  0); 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  L  In  1  to  2  Loop 

func_templ  :=  (A(i)*B(L)); 

result (i,L)  :=  func_templ  (47  Downto  16); 

End  Loop  ; 

End  Loop  ; 
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Return  result  ; 

End  mat r ix_mult _2x 1 _ 1x2  ; 

Function  to  multiply  a  2x1  matrix  and  a  scalar  * 

Function  matr ix_mult _2xl_int  (A:  matrix_2xl  ;B:  Signed(31  Downto 

0)) 

Return  matrix_2xl  Is 
Variable  result  :  matrix_2xl ; 

Variable  func_templ  :  Signed  (63  Downto  0); 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

func_templ  :=  A(i)*B; 

result  (i)  :=  f unc_t emp 1  (47  Downto  16); 

End  Loop  ; 

Return  result ; 

End  matrix_mult _2xl_int ; 

Function  to  add  a  2x1  to  a  2x1  matrix  * 

Function  matr ix_add_2xl_2xl  (A:  matrix_2xl  ;B:  matrix_2xl) 

Return  matrix_2xl  Is 
Variable  result  :  matrix_2xl ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

result(i)  :=  A(i)+B(i); 

End  Loop  ; 

Return  result ; 

End  matr ix_add_2xl_2xl ; 

Function  to  subtract  two  2x2  matrices  * 
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Function  matr ix_subtract _2x2  (A,B:  matrix_2x2) 

Return  matrix_2x2  Is 
Variable  result  :  matrix_2x2 ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

For  j  In  1  to  2  Loop 

result(i,j)  :=  A ( i , j ) -B ( i , j ) ; 

End  Loop  ; 

End  Loop  ; 

Return  result ; 

End  matrix_subtract _2x2 ; 

Function  to  return  the  diagonal  (diag)  of  a  2x2  matrix  * 

Function  diag_2x2  (A:  matrix_2x2) 

Return  matrix_2xl  Is 
Variable  result  :  matrix_2xl ; 

Begin--Begin  function  code . 

For  i  In  1  to  2  Loop 

result(i)  :=  A(i,i); 

End  Loop  ; 

Return  result ; 

End  diag_2x2 ; 

Begin  main  process  .  * 

Begin 

Process  (z_position,  reset)  Is 

Constant  dt  :  Signed  (31  Downto  0)  :  = 

"00000000000000000001100110011001"; 

--R  is  the  measurement  noise  covariance . 
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Constant  R  :  Signed(31  Downto  0)  := 

"00000000000010100000000000000000  "  ; 

--Q  is  "dynamic  noise  strength"  (process  noise  ... 
covariance ) . 

Constant  Q  :  Signed(31  Downto  0)  := 

"00000000011001000000000000000000"; 

--G  is  the  noise  injection  model. 

--This  was  intended  to  be  2  rows ,  1  column  but  is  .  .  . 

represented  as 
--1  row,  2  columns. 

Constant  G  :  matrix_2xl  := 

("000  0000000000000000000000000  0000  "  , 

"0000  000000000001000000000000  0000  ")  ; 

--This  was  intended  to  be  2  rows ,  1  column  but  is  .  .  . 

represented  as 
--1  row,  2  columns. 

Constant  B  :  matrix_2xl  := 

("000  0000000000000000000000000  0000  "  , 

"0000  000000000001000000000000  0000  ")  ; 

--This  was  intended  to  be  2  rows ,  1  column  but  is  .  .  . 

represented  as 
--1  row,  2  columns. 

Constant  Bd  :  matrix_2xl  := 
("00000000000000000000000101000111", 
"00000000000000000001100110011001"); 

Constant  H  :  matrix_lx2  := 

("000  0000000000001000000000000  0000  "  , 

"0000  000000000000000000000000  0000  ")  ; 

Constant  H_prime  :  matrix_2xl  := 
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("000  0000000000001000000000000  0000  "  , 
"0000  000000000000000000000000  0000  ")  ; 

Constant  F  :  matrix_2x2  := 

( ( " 00000000000000000000  000000000000 "  , 
"0000  000000000001000000000000  0000  ")  , 
("000  0000000000000000000000000  0000  "  , 
"0000  000000000000000000000000  0000") )  ; 


Constant  phi  :  matrix_2x2  := 
(("00000000000000010000000000000000", 
"0000  0000000000000001100110011001  ")  , 
("000  0000000000000000000000000  0000  "  , 
"0000  000000000001000000000000  0000")  )  ; 


Constant  phi_prime  :  matrix_2x2  := 
(("00000000000000010000000000000000", 
"0000  000000000000000000000000  0000  ")  , 
("000  00000000000000001100110011001  "  , 
"0000  000000000001000000000000  0000")  )  ; 

Constant  Qd  :  matrix_2x2  := 

( ( " 00000000000000000000100010000110  "  , 
"0000  000000000000100000000000  0000  ")  , 
("000  0000000000000100000000000  0000 "  , 
"0000  000000001010000000000000  0000")  )  ; 

Constant  Gd  :  matrix_2x2  := 
(("00000000000000010000000000000000", 
"0000  000000000000000000000000  0000  ")  , 
("000  0000000000000000000000000  0000  "  , 
"0000  000000000001000000000000  0000")  )  ; 


Definition  of  Variables  * 
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Variable  x  :  matrix_2xl  := 

("000  0000000000001000000000000  0000  "  , 
"0000  000000000001000000000000  0000  ")  ; 

Variable  P  :  matrix_2x2  := 

( ( " 00000000000000000100  000000000000  "  , 
"0000  000000000000000000000000  0000  ")  , 
("000  0000000000000000000000000  0000  "  , 
"0000  000000000000010000000000  0000")  )  ; 

Variable  A  :  Signed(31  Downto  0); 
Variable  residual  :  Signed(31  Downto  0) ; 
Variable  K  :  matrix_2xl ; 

Variable  K_temp  :  Signed(33  Downto  0); 


Begin 


If  reset  =  ^  1  ^  Then 

X  :  = 

(" 00000000000000010  000000000000000  "  , 
" 000000000000000100000000  00000000  ")  ; 


P  :  = 

(( "0000000000  0000000100000000000000  "  , 
" 000000000000000000000000  00000000  ")  , 
(" 00000000000000000  000000000000000  "  , 

" 000000000000000001000000  00000000  ")  )  ; 

End  If; 


X  :=  mat r ix_mult _2x2_2x 1 (phi , x) ; 

P  :=  matr ix_add_2x2 ( ( matr ix_mult _2x2  (  . .  . 

matr ix_mult _2x2 (phi ,P) ,phi_prime)) ,Qd) ; 
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Appendix  C.  VHDL^  RTL  Kalman  Filter  Implementation  Entities 
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1  1  ^his  appendix  contains  the  VHDL,  RTL  Kalman  filter  implementation  entities. 

Listing  C.l: 

Entity  add_behavi or al  Is 

generic  (high_bit  :  natural  :=  31);  --This  is  the  highest 

--order  bit 

Port  (reset  : 

In  std_logi c ; 

in_a  : 

In  s igned ( high_bit  Downto  0); 

in_b  : 

In  s igned ( high_bit  Downto  0); 

output  : 

Out  signed (high_bit  Downto  0); 

c_out  : 

Out  std_logi c ) ; 

End  add_behavi or al  ; 

Listing  C.2: 
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Entity  add_ sub_behavi or al  Is 

generic  (high_bit  :  natural  :=  31);  --This  is  the  highest 

--order  bit 


End 


Port  (add_sub 

--0  add  in_A 
--1  subtract 
reset 
in_a 
in_b 
output 
c_out 

add_ sub_behavi or al ; 


In  std.logic; 
to  in_B 

in_B  from  in_A 
In  std.logic; 

In  s igned ( high_bit  Downto  0); 
In  s igned ( high_bit  Downto  0); 
Out  signed ( high_bit  Downto  0) ; 
Out  std_logi c ) ; 
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Entity  ALU_2x2  Is 

Generic (high_bit 
:=  16)  ; 


Listing  C.3: 


natural  :=  31;  f r act i on_ s ize  :  natural 

:  In  s igned ( high_bit  Downto  0) ; 
:  In  s igned ( high_bit  Downto  0) ; 
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Port(A_00,  A_01 ,  A_10,  A_ll 
B_00 ,  B_01 ,  B_10,  B_ll 
reset  :  In  std_logic ; 
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add_sub  :  In  std_logic; 
reg_load  :  In  std_logic ; 
mux_control  :  In  std_logic ; 
elk  :  In  std_logic; 

C_00 ,  C_01  ,  C_10  ,  C_ll  :  Out  s igned ( high_bit  Downto  0)  ; 
overflow  :  Out  std_logic); 

End  ALU_2x2; 


Listing  C.4: 
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Entity  controller  Is 

PortCclk,  reset  :  In  std_logic ; 

mem_control ,  ALU_1 ,  ALU_2  :  out  signed (3  Downto  0); 
mux_l  ,  mux_2  ,  output _reg_load  ,  recipr ocal_r eset  , 
reciprocal_load  ,  recipr ocal_mux_control  :  out  st d_logi c ) . .  . 

3 

End  controller 


Listing  C.5: 
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Entity  controller _2D  Is 

PortCclk,  reset .external  :  In  std.logic ; 

reset  ,  load.main  ,  load_sp_normhold  ,  mux.control  , 
load_sp_l  ,  load_sp_reg5  :  out  std.logic); 

End  controller _2D  ; 


Listing  C.6: 
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Entity  denormalization  Is 

Generic (high.bit  :  natural  :=  31;  f r act i on_ s ize  :  natural 
:=  16)  ; 

Port  (data.in  :  In  s igned ( high.bit  Downto  0); 
sign.flag  :  In  std.logic ; 
shift.value  :  In  integer; 

shif t.direct ion  :  In  std.logic ; ^ 0 ^  for  right  and 

^ 1 ^  for  left 

elk  :  In  std.logic; 
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10 


data_out  :  Out  signed ( high_bit  Downto  0) ) ; 


11 


End 


denormalization ; 


Listing  C.7: 


1 

2 

3 

4 

5 

6 

7 

8 
9 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Entity  KF_RTL_top  Is 

Gener i c ( high_bit  : 

natural  :=  31;  f r act ion_s ize  :  natural  :=  ... 

16)  ; 

Port ( elk  ,  reset  : 

In  std_logic ; 

input  :  In  s igned ( high_bit  Downto  0) ; -- Otherwi s e  known  as  z 

x_out 1  ,  x_out 2 

:  Out  s igned ( high_bit  Downto  0); 

overflowl  :  Out 

std_logi c ; 

overflow2  :  Out 

std_logi c ; 

overf low_reciprocal  :  Out  std_logic); 

End  KF_RTL_t op  ; 

Listing  C.8: 

Entity  mem  Is 

Generic (high_bit 

natural 

:=  31) 

3 

Port ( elk  ,  reset 

In  std_ 

logic  ; 

control  :  In 

s 

igned  (3 

Downto 

0)  ; 

inl.OO 

,  ini  _ 

01 

,  inl.lO 

,  inl_ 

11  : 

In 

signed  (high_bit 

Downto  0) 

) 

in2_00 

,  in2_ 

01 

,  in2_10 

,  in2_ 

11  : 

In 

signed  (high_bit 

Downto  0) 

} 

> 

1 

o 

o 

A_01  , 

A_ 

10,  A_ll 

:  Out 

signed 

(high_bit  Downto 

0)  ; 

o 

o 

1 

PQ 

B_01  , 

B_ 

10,  B_ll 

:  Out 

signed 

(high_bit  Downto 

0)  ; 

O 

1 

O 

o 

C_01  , 

C_ 

10,  C_ll 

:  Out 

signed 

(high_bit  Downto 

0)  ; 

D_00  , 

D_01  , 

D_ 

10,  D_ll 

:  Out 

signed 

(high_bit  Downto 

0))  ; 

End  entity  mem; 

Listing  C.9: 


1 

2 

3 


Entity  mult _behavi or al  Is 

Generic  (high_bit  :  natural  :=  31; 

f ract ion_size  :  natural  :=  16);  --This  is  the 
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4 


--highest  order  bit 


5 

6 

7 

8 
9 

10 


End 


Port  (in_a 
in_b 

clear _async 

product 

c_out 

mult .behavioral ; 


In  signed ( high.bit  Downto  0); 

In  s igned ( high.bit  Downto  0); 
In  std.logic ; 

Out  s igned ( high.bit  Downto  0) ; 
Out  std.logi c ) ; 


Listing  C.IO: 


Entity  mux 

Is 

Gener i c ( high.bit  :  natural  : 

=  31)  ; 

Port (control  :  In  std.logic ; 

A 

:  In  s igned ( high.bit 

Downto 

0)  ; 

B 

:  In  signed (high.bit 

Downto 

0)  ; 

C 

:  Out  s igned ( high.bit 

Downt  0 

0))  ; 

End  entity 

mux  ; 

Listing  C.ll: 


Entity  mux_2_to_l  Is 

Gener i c ( high.bit  :  natural  : 

:=  31)  ; 

Port (control  :  In  std_ 

.logic  ; 

> 

1 

o 

o 

> 

1 

o 

> 

1 

o 

A_ll 

:  In  s igned ( high.bit 

Downt  0 

0)  ; 

o 

1 

PQ 

O 

1 

PQ 

O 

O 

1 

PQ 

1 

PQ 

:  In  signed ( high.bit 

Downto 

0)  ; 

c_00 ,  C_01,  C_10, 

C_ll 

:  Out  s igned ( high.bit 

Downto 

0))  ; 

End  entity  mux_2_to_l; 

Listing  C.12: 


1 

2 

3 

4 


Entity  mux_4_to_2  Is 

Gener i c ( high.bit  :  natural  :=  31); 
Port (control  :  In  std.logic ; 

A0_00  ,  A0_01,  A0_10,  A0_11  :  In 


signed ( high.bit 


Downt  o 


0)  .  .  . 


5 


A1_00  ,  Al_01  ,  Al_10 ,  Al_ll  :  In  s igned ( high.b it  Downto  0)  . .  . 


6 


7 

8 

9 

10 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


1 

2 

3 

4 

5 

6 


1 

2 


B0_00  ,  B0_01  ,  B0_10  ,  B0_11  :  In  s igned  ( high_b it  Downto  0)  .  .  . 

3 

B1_00  ,  Bl_01  ,  Bl_10 ,  Bl_ll  :  In  s igned ( high_b it  Downto  0)  . .  . 

3 

C0_00  ,  C0_01  ,  C0_10  ,  C0_11  :  Out  s igned ( high_bit  Downto  ... 

0)  ; 

C1_00  ,  Cl_01  ,  Cl_10  ,  Cl_ll  :  Out  s igned ( high_bit  Downto  ... 

0))  ; 


End  entity  mux_4_to_2 ; 


Listing  C.13: 

Entity  normalizat ion_sqrt  Is 

Generic (high_bit 

:  natural  :=  31;  fraction 

_size 

:  natural 

:=  16)  ; 

Port  (data_in  :  In  s igned ( high_bit  Downto 

0)  ; 

data_out  : 

Out  s igned ( high_bit  Downto  0) ; 

sign_flag  : 

Out  std_logic ; 

shift _value 

:  Out  integer ; 

shif t_direct ion  :  Out  st d_logi c ; ^ 0 

^  for 

right  and 

--  U  ^ 

for 

left 

evenO_oddl 

:  Out  std_logic ) ; 

End  normal izat i on_ sqrt 

3 

Listing  C.14: 

Entity  NR_LT_R0M  Is 

Generic  (high_bit 

:  natural  : =  31; 

fraction 

_size  :  natural  :=  16); 

Port (address  :  In 

Signed(7  Downto  0); 

estimate  :  Out  Signed ( high_bit  Downto 

0))  ; 

End  NR_LT_R0M; 

Listing  C.15: 

Entity  reciprocal_stage 1  Is 


Generic (high_bit  :  natural  :=  31;  f r act i on_ s ize  :  natural 
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3 

4 

5 

6 

7 

8 
9 

10 


:=  16)  ; 

Port  (data_in  :  In  s igned ( high_bit  Downto  0); 

data_out  :  Out  signed ( high_bit  Downto  0); 
sign_flag  :  Out  std_logic; 
shift_value  :  Out  natural; 

shif t_direct ion  :  Out  std_logic ) ; ^ 0 ^  for  right  and 

^ 1 ^  for  left 

End  reciprocal_st age  1  ; 


Listing  C.16: 


Entity  reciprocal_stage3 

Is 

Gener ic (high_bit  : 

natural  :=  31;  f r act i on_ s ize  :  natural 

:=  16)  ; 

Port  (data_in  :  In 

s igned ( high_bit  Downto  0); 

mult2_out  :  Out  signed ( high_bit  Downto  0); 

square_out  : 

Out  s igned ( high_bit  Downto  0)); 

End  reciprocal_st age3 ; 

Listing  C.17: 


Entity  reciprocal_stage6 

Is 

Gener ic (high_bit  : 

natural  :=  31;  fraction 

_size  :  natural 

:=  16)  ; 

Port  (data_in  :  In 

signed ( high_bit  Downto 

0)  ; 

sign_flag  :  In 

std_logic ; 

shift_value  : 

In  natural ; 

shif t_direct ion  :  In  std_logic ; ^ 0 ^ 

for  right  and 

--  ^  1 

^  for  left 

elk  :  In  std_logic; 

data_out  :  Out 

signed ( high_bit  Downto 

0))  ; 

End  reciprocal_st age6 ; 

Listing  C.18: 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Entity  reciprocal_top  Is 

Gener i c ( high_bit  :  natural  :=  31;  f r act ion_s ize 
16)  ; 


natural  := 


2 


3 


4 

5 

6 
7 


Port  (elk,  reset,  load,  mux_control  :  In  std_logic ; 
data_in  :  In  signed (high_bit  Downto  0); 
data_out  :  Out  s igned ( high_bit  Downto  0); 
overflow  :  Out  std_logic); 

End  reciprocal_t op ; 


1 

2 

3 

4 

5 

6 
7 


Listing  C.19: 


Entity  reg_alu  Is 

Generic (high_bit 


natural 


31)  ; 


Port ( elk 
load 
d 

q 


In  std_logi c ; 

In  std_logi c ; 

In  signed (high_bit  Downto  0); 
Out  signed ( high_bit  Downto  0) ) ; 


End  entity  reg_alu; 


1 

2 

3 

4 

5 

6 


Listing  C.20: 

Entity  RS_by_one  Is 

Generic (high_bit  :  natural  :=  31;  f r act i on_ s ize  :  natural 
:=  16)  ; 

Port  (data_in  :  In  signed ( high_bit  Downto  0); 

data_out  :  Out  signed ( high_bit  Downto  0)); 

End  RS_by_one ; 


Listing  C.21: 


Entity  sqrt_R0M  Is 

Generic  (high_bit 

:  natural  :=  31; 

f  r act i on_ 

size  :  natural  :=  16) ; 

Port (address  :  In 

Signed(7  Downto  0); 

estimate  :  Out  Signed ( high_bit  Downto  0)); 

End  sqrt  _R0M ; 

Listing  C.22: 

Entity  sqrt _R0M_even  Is 

Generic  (high_bit 

:  natural  : =  31; 
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3 

4 

5 

6 


f r act i on_ s ize  :  natural  :=  16); 

Port ( address  :  In  Signed(7  Downto  0); 

estimate  :  Out  Signed ( high_bit  Downto  0) ) ; 
End  sqrt _R0M_even ; 


1 

2 

3 

4 

5 

6 


1 

2 

3 

4 

5 

6 

7 

8 
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1 

2 

3 

4 

5 

6 

7 

8 
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Listing  C.23: 


Entity  sqrt_R0M_odd 

Is 

Generic  (high_ 

bit 

:  natural  :=  31; 

fraction. 

.size  :  natural  := 

16)  ; 

Port ( address  : 

In 

Signed(7  Downto  0) 

} 

estimate 

:  Out  Signed ( high_bit 

Downto  0) ) ; 

End  sqrt _R0M_odd ; 

Listing  C.24: 

Entity  squar ed_behavi or al  Is 

Generic  (high_bit 

:  natural  :=  31; 

f  r act i on_ 

size  :  natural  :=  16) ;  --This  is  the 

--highest  order  bit 

Port  (in_a  : 

In  signed ( high_bit  Downto  0); 

clear _async 

:  In  std_logic ; 

product 

:  Out  s igned ( high_bit  Downto  0); 

c_out 

:  Out  std_logi c ) ; 

End  squar ed_behavioral ; 

Listing  C.25: 

Entity  sub_behavi or al  Is 

generic  (high_bit  : 

natural  :=  31);  --This  is  the  highest 

--order  bit 

Port  (reset 

:  In 

std_logi c ; 

in_a 

:  In 

s igned ( high_bit  Downto  0); 

in_b 

:  In 

s igned ( high_bit  Downto  0); 

output 

:  Out 

s igned ( high_bit  Downto  0); 

c_out 

:  Out 

std_logi c ) ; 

End  sub_behavior al ; 
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Listing  C.26: 


Entity  sub.const .behavioral  Is 

gener ic (high. bit  :  natural  :=  31; 

fraction. size  : 

natural  : =  16) ; 

Port (reset  : 

In  std. logic; 

in. a  : 

In  s igned ( high. bit  Downto  0); 

output  : 

Out  signed ( high. bit  Downto  0); 

c.out  : 

Out  std. logic) ; 

End  sub. const .behavi or al ; 
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Listing  C.27: 

Entity  TwoD_connect_t op  Is 

Gener i c ( high_bit  :  natural  :=  31;  f r act ion_s ize  :  natural  :=  ... 
16)  ; 

PortCclk,  reset,  reset _TwoD_top_controller  :  In  std_logic; 

input_zl  :  In  s igned ( high_b it  Downto  0); --Input  from  the  x-... 
coordinate 

--Kalman  filter. 

input_z2  :  In  signed ( high_bit  Downto  0); --Input  from  the  y-... 
coordinate 

--Kalman  filter. 

output  :  Out  signed (high_bit  Downto  0); 
overflow  :  Out  std_logic ; 

output_posit ionl  :  Out  s igned ( high_bit  Downto  0); 
output _po s it i on2  :  Out  s igned ( high_bit  Downto  0)); 

End  TwoD_connect_t op ; 


1 

2 

3 

4 

5 


Listing  C.28: 


Entity  TwoD_top  Is 

Gener i c ( high_bit  :  natural  :=  31;  f r act ion_s ize  :  natural  :=  ... 
16)  ; 

PortCclk,  reset .external  :  In  std.logic ; 

input.x  :  In  s igned ( high.bit  Downto  0); --Input  from  the  x-... 
coordinate 


--Kalman  filter. 
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6 


input_y  :  In  signed (high_bit  Downto  0); --Input  from  the  y-... 


coordinate 


7 

8 
9 

10 


End 


--Kalman 

output  :  Out  s igned ( high_bit  Downto  0); 
overfIow_2D  :  Out  std_Iogic); 

TwoD_t op ; 


filter . 


Listing  C.29: 


1 

2 

3 

4 


Entity  TwoD_t op_controIIer  Is 

PortCcIk,  reset  :  In  std_Iogic ; 

reset .external  :  out  std.logic); 
End  TwoD.t op.controller ; 
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Appendix  D.  Design  Schematics 

his  appendix  contains  schematics  for  some  of  the  VHDL  Kaiman  fiiter  moduies. 
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CD 
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Figure  D.l:  Schematic  for  the  Newton  Raphson  reciprocal  function. 


TWOD  TOP  CONTROLLER1 
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Figure  D.3:  Bottom  level  schematic  for  the  two-dimensional  Kalman  filter  implementation. 
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dimensional  VHDL  implementation  of  the  Kalman  filter.  The  two-dimensional  Kalman  filter  expands  on  the  one-dimensional 
implementation  allowing  for  the  traeking  of  targets  on  a  real-world  Cartesian  eoordinate  system. 
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